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Phonological and Articulatory Characteristics of 

Spoken Language"^ 

Carol A. Fowlert 



LINTRODUCnON 

Speaking may be our most impressive motor 
skill. We speak rapidly, and production of each 
word involves intricate sequencing and temporal 
interleaving of gestures for the component, 
ordered consonants and vowels of the word. The 
problem of understanding speech production at 
this level is that of understanding how speakers 
accomplish the feat of fluent consonant and vowel 
production. Solving that problem involves solving 
another one, however. It is to understand what 
speaking is essentially. That is, it is to understand 
how a series of complicated actions of a vocal tract 
can serve to convey a message composed of 
rulefully-pattemed symbols to members of a 
language commimity. In fact, the kind of solution 
an investigator seeks to the problem of 
understanding how vocal-tract actions are 
executed depends on how the investigator looks at 
the relation between vocal-tract action and the 
linguistic message itself. 

Hionology is traditionally seen as the discipline 
that concerns itself with the building blocks of 
linguistic messages. It is the study of the 
structure of sound inventories of languages and of 
the participation of sounds in rules or processes. 
Phonetics, in contrast, concerns speech sounds as 
produced and perceived. Two extreme positions on 
the relationship between phonological messages 
and phonetic realizations are represented in the 
literature. One holds that the primary home for 
linguistic symbols, including phonological ones, is 
the human mind, itself housed in the human 
brain. The second holds that their primary home 
is the human vocal tract 

Consider the first position and the conceptual- 
ization of speedi production to whidi it leads. For 
at least two reasons, the vocal tract is rijjected as 
a natural home for phonological segments of the 
language. A philosophical reason is that phonemes 



are not the kinds of things that can occur or exist 
outside the mind. They are i'^eas or concepts with- 
out real-world actualizations Articulatory gestures 
or their acoustic consequences can serve as cues to 
phonological segments, but they cannot be phono* 
logical segments. 

""[Segments] *Tt abstractions. Usty are tbe end result 
d cooq^ex peicq)tual snd cognitive processes in the 
lisiener*s bfain*" (Rqsp 1981, 1462) 

''Phonological representation is concerned with 
speakers* implicit knowledge, that is with 
information in the mind...[Pbonetic] represen- 
titkn..is HOC cognitive because it ooncems evsnts in 
the world rather than events in the mind.** 
(Pienehombert, in piess) 

A practical reason why phonological segments 
cannot occur in the vocal tract is that linguistic 
symbols have other properties, aside from being 
covert 'kinds of things, that preclude the vocal 
tract from representing them veridically or even 
analogically. In particular, a central and impor- 
tant fact about language is that its messages are 
composed of discrete symbols. Phonological seg- 
ments are discrete in die sense that they do not 
overlap and blend. Moreover, until recently, they 
have been represented in linguistic theories as 
they were composed of lists of coextensive (and by 
implication, cotemporal) features (cf. Chomsky & 
Hidle, 1968). The features themselves described 
static postures of the vocal tract or their acoustic 
consequences; accordingly, the feature lists of a 
word described a succession of vocal-tract or 
acoustic snapshots. The vocal-tract actions that 
somehow convey a message to a Ustener have 
none of those properties. Actions associable with a 
given consonant or vowel do overlap and do ap- 
pear to blend with actions of neighbors. Actions 
identifiable with the component features of a con- 
sonant or vowel are not cotemporal. Finally, fun- 



damental units of articulation appear to be ac- 
tions, not postures; accordingly, time is intrinsic to 
speech, rather than extrinsic as it is to the 
linguistic message. One interpretation of these 
mismatches is that they reflect the mismatch 
betweu the ideal of linguistic competence and the 
degraded physical reality of linguistic vocal perfor- 
mance; the latter necessarily is a considerable dis- 
tortion of the ibrmer due to the limitations of me- 
chanioo-inertial systems. This way of looking at 
speech production promotes development of a kind 
of theoTy of the liow* of speech production that 
have been termed translation theories (Fowler, 
Rubin^ & Remes et aL, 1980). The mismatch 
between the dxaracter of &e planned message, 
presumably a sequence of linguistic symbols, and 
of its physical, phonetic, realisation requires a 
translation over stages of processing out of the 
ideal, mental, domain of the plan into the real, 
phjnrictl-nonmmta], domain of a vocal tract 

The other wtn^r^ perspective oo the nature of 
speaking is that consonants and vowels are ac- 
tions of the vocal tract that have linguistic, includ- 
ing phcoiological, significance in a language com» 
munity. They are, certainly, psydiological actions 
that require knowledge about them to be per- 
formed. However, the knowledge is not a superior 
'^deal* that the actions cannot implement; rather, 
the knowledge is about the actions, derived from 
perceptual and articulatory e^>erience with them. 
From this perspective, the mismatdi between lin- 
guistic segments and articulatiim described above 
is apparent rather than reaL It is the product of 
three kinds of error 1. a mistaken ascription of 
primacy to linguistic knowledge (competence) over 
linguistic activity (performance); 2. an incorrect 
diaracterizatian of phonological segments in lin- 
guistic theory; 3. an incorrect characterization of 
the vocal tract actions of speech production. As to 
the first 'error,* the argument is that we treat 
language differently from other human creations 
when we decide that its components exist only in 
the mind. Other human creations include, for ex- 
ample, automobiles, baseball games and musical 
pieces. Automobiles definitely exist In the world 
and so do baseball games and musical pieces when 
they are played. What is in the mind oF those who 
know about automobiles, baseball and a musical 
piece, is only what they know about those things; 
it is not the things themselves. If linguistic con- 
cepts are like these other concepts, they are 
knowledge about real-world objects or events; the 
events have a pQrchc^ogical nature-in this case, 
they are actions of iiie vocal tract, identified as 
phonological segments. If the phonology in the 



mind of a lan'Toage user is what a the user knows 
about the actions that implement a linguistic mes- 
sage, then there need be no mismatch between 
knowledge and action. If a phonological theory as- 
cribes proparties to phonological segments as 
known that are impossible to realise in vocal-tract 
action, then the first hypothesis should be that the 
theory is wrong, not that vocal-tract action dis- 
torts components of linguistic competence. If de- 
smptions of vocal-tract actioxis include properties, 
sudi as coarticulatoiy blending, that would distort 
the phonological message, then the first hypothe- 
sis should be that the descriptions are wrong. 
From this perspective, an important aim is to 
work on devekpinent of a phonology that does not 
ascribe properties to phonological segments that 
are unproduceable as vocal-tract action (cf. 
Browman & Goldstein, 1986; Browman & 
Goldstein, 1989). A second aim is to find a per- 
spective on vocal-tract action from wideh nmcro- 
scopic order is evident that conforms to the phono- 
logical structure of spoken utterances (e.g.. 
Fowler, Rubin, & Remez, et aL, 1980; Fowler, in 
press; Saltzman, 1986; Saltzman & Munhall, 
1989). 

This theoretical perspective promotes a theoiy 
of speech production different fit)m a translation 
theoiy as outlined earlier. Speedi production does 
not involve a translatiGn out of an ideal, mental 
domain into a physical, nonmental, domain. 
Rather, the plan for a sequence of phonological 
segments, physically instantiated in the brain, 
replicates itself in a new physical medium, the 
moving vocal tract A speech plan, in some way, 
brings about vocal-tract actions having linguistic 
significance. 

In the remainder of this chapter, I pursue the 
different outlooks on a central aspect of speech 
production, coarticulation, that these different 
theoretical perspectives promote. I then consider 
the implications of our understanding of 
coarticulation for understanding another central 
aspect of speech production: the coordinated 
actions of the vocal tract that constitute token 
phonological segments. 

2. TWO PERSPECTIVES ON 
COARTICULATION 

All sources of evidence regarding speech 
production, whether they are acoustic or 
articulatory, provide the same general picture of 
context-sensitivity in speech production. An 
acoustic signal displayed spectrographically or as 
a waveform, for example, can be divided into 
phonological-segment sized regions (e.g., Klatt 
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1973) by identifying aoouttic pi^rtks that are 
more strongly associated with one particular 
segment of an utterance than ¥rith others. For 
example a stop burst can be assigned to a stop in a 
stop-vowel utterance, and the following voiced 
formants can be assigned to the vowel. Even so, 
however, the display has not thereby been 
partitioned into phonological segments or even 
into their acoustic consequences. This is so, in 
part, because there may be no obvious place to 
locate a boundary separating the acoustic 
consequences of one phoneme from those of 
another. For example, the voiceless formant 
transitions following a voiceless stop cons on ant in 
a consonant-vowel sequence belong with the stop, 
because they are voiceless, but they also belong 
with the vowel, because formants are 
characteristic of vowels and other sonorants (see, 
e«g., t ^tmoa & Lehiste, 1960). Indeed, generally, 
there are no boundaries between segments so that 
a partition leaves all and only the acoustic 
information for one segment on one side of the 
boundary and all and only that for another 
segment on the other side (cfl Fant & Lindblom, 
1961). Moreover, the overlap is not only in a 
potential boundary region. Spectral analysis of the 
signal well within a domain associated with a 
particular phonetic segment— well within the 
frication region for a fricative or within the 
steady-state formants, if any, of a vowel, for 
example— 48 likely to reveal influences of context. 
(I will use the term ""domain* to refer to the 
temporal region in which the features of a 
segment dominate in articulation or in the 
acoustic signal. The domain does not include the 
whole /irticulatory extent of a segment or the 
whole region in which it influences the acoustic 
signal, but only the region in which it is dominant; 
see also Lfifqvist, 1990.) 

Examination of tiie articulatoiy behaviors that 
give rise to acoustic speech signals reveals a 
compatible picture. Articulatory movements can 
be found that are identifiable with one of the 
phonetic segments in an utterance— movement 
toward bilabial closure in a bV sequence, for 
example. In addition, boundaries can be located 
around that movement. In the example, a 
boundary may be located where closing is first 
detectable and another at the point of release of 
the closure. Once again, however, the boundaries 
are not boundaries between phonological 
segments or their articulatory consequences so 
that all and only movements associated with /b/ 
occur within the boundaries and movements 
associated with other segments fall outside the 



boundaries. During the closing and closure 
gestures, the tongue body will be conforming itself 
to the requirements of the following vowel (e.g., 
Ohman 1966), and once again, the movements 
within the boundaries are context-sensitive. For 
example, the jaw moves to a higher point of 
TnaxinwTO dosing for /fa/ followed by a high than a 
low vowel (Keating, Lindblom, and Lubker, cited 
in Keating, 1985). 

Sources of context sensitivity are bidirectional. 
Effects of earlier segments in the string extending 
beyond their domains of prominence are termed 
*left-to-right," *perseverative* or *carryover^ ef- 
fects. Effects of later segments are called ""ri^t-to- 
left* or ""anticipatory.* Estimates of the coarticula- 
tory field — ^that is, the interval of time or the 
number of segmental domains affected by a seg- 
ment in either direction — ^vary consider^ly, but 
may be quite large. For example, Magen (1989) 
reports anticipatory effects of V3 on ViCdCVs se- 
quences in English. While some part of the cany- 
over irfluences can been ascribed to inertial prop- 
erties of the vocal tract and to its inability instan- 
taneously to adopt a diaracteristic posture for one 
phonological segment without exhibiting transi- 
tional movements between the postures, anticipa- 
tory coarticulation cannot have that explanation, 
and carryover effects are sometimes more exten- 
sive than can be realistically ascribed to these me- 
chanical factors (Daniloff & Hammarberg, 1973). 
These considerations have suggested to many in- 
vestigators that coarticulation is planned. 
(Generally accounts of coarticulation diverge along 
the theoretical lines distinguished in the intro- 
duction. 

2.1 Coarticulation as assimilation by 
feature spreading 

In a translation theory, coarticulation server an 
important function of, indeed, translating a 
planned symbol string into a form more 
compatible with the capabilities of vocal-tract 
action. (The role of phonetic rules generally, 
according to Keating (1988a), is to make the 
linguistic representation ^ore physical.*) 

One example of a theory in whidi coarticulation 
serves that function is that of Daniloff and 
Hammarberg (1973). Daniloff and Hammarberg 
described the phonological segments that serve as 
^put* in a plan to speak as ^canonical forms* — 
that is, ^invariant, ideal, unooarticulated forms*— 
the phonological types of a linguistic theory. These 
forms undergo ^articulatory encoding* to tailor 
them to the vocal tract. The encoding processes 
include application of context-sensitive rules of 



feature epre^iding. An eumple they provide of 
ittch a rale is one that apreads a rounding feature 
from a vowel to a preceding /l^: 1 / ^ [-f-iound, 
^Vl By this rale, the 4/ in ^thoe * for example, ac- 
quires the fiMUire [^t-roundl from its context, a fol- 
lowing rounded vowel Generally (following Henke 
1966)» rules cause a foature to spread in an antici- 
patory direction to any phonetic segment that is 
'Hmspecified* for tiiat feature. Feature values in 
phonological theory ganerally are binary, and a 
segment may be 'Specified* for a feature having 
either a V* or a value of that feature. 
(Accordingly, a rounded vowel is [<f round] while 
an unrounded vowel is [-round].) To cc^Jit as an 
instance of a segment specified for some feature 
value, a token occurrence of the segment must 
have the ^>pnq[>riate feature value; changing the 
value may chan^^ one segmmt into another and 
hence, in a sequeooe of phonemes, may cJiange one 
word into another. These featun values thereby 
serve a ''contrastive* function in the language. At 
least hypothetically, the contrastive feature values 
cannot be changed by a feature-spreading coartic- 
ulatory rule. However, some features are irrele- 
vant to the identification of some segments. For 
example, in English, rounding is not contrastive 
for consonants; accordingly, maldng a consonant 
rounded does not change it finm one consonant of 
English to another. (Consonants are said to be *un- 
specified* for rounding, and they are subject to 
coarticulatory rules of feaWre spreading. 

Evidence compatible with the feature- spreading 
theory includes findings (or, perhaps, 
interpretations of findings; see 2.2) that lip 
rounding anticipates a rotmded vowel across any 
number of preceding consonants (e.g., Daniloff & 
Moll, 1968); (Benguerel & Cowan, 1974) and that 
nasality anticipates a nasal consonant across any 
number of vowels uninterrupted by oral 
consonants (Moll & Daniloff, 1971). 

The simple diaracterization of coarticulation 
fails in several ways. One is that the coarticula- 
tory field very often does not respect boundaries 
drawn between segments. That is, the hypotiiesis 
of feature spreading as the sole source of coarticu- 
lation predicts that the spread feature should be 
uniformly present throughout the production of 
the segment-^t least to the same extent that 
other features of the segment are present, but that 
is generally not the case (e.g., Benguerel & 
Cowan, 1974; Krakow, 1989). Second, the 
magnitude of effects of ostensibly spread features 
is gradient /ather than categorical. For example, 
Manuel and Krakow (1984) found that a following 
(front, hi|^) vowel raises and fronts following 



Oow, back) vowel /a/, but (front, high) /%/ raises it 
even more. Likewise, Marchal (1988) reported 
graded effects of one stop consonant on anotiier in 
/kt/ sequenoes that suggested varying degrees of 
coarticulatory overly between them. A third 
problem is that coarticulatory influences ms\y af- 
fect realizations of specified features. In MardiaVs 
findings, just cited, coarticulatory influences occur 
between stops specified for different places of ar- 
ticulation. A final problem relates to the idea of 
underspedfication. The problem here is that seg- 
mentr considered to be unspecified for a feature 
involving some articulator — say, rounding and the 
lips (in English, consonants) or nasaKty and the 
velum (in English, vowels)— are not wholly 
neutral with respect to the demands they make on 
the articulator. Some consonants are associated 
with rounding movements of the Hps (for example, 
/!/, M and /s/ and 4^ (BeU-Berti Harris, 1982; 
Delattre ft Freeman, 1968; Leidner, 1973). 
Compatibly, vowels, ostensibly unspecified for 
nasality are associated with characteristic pos- 
tures of the velum (Bell-Berti, 1980; MoU, 1962). 
Despite their not being wholly unspec^ed in 
terms of articulatory control, they are subject to 
coarticulatory influences from specified neii^ibors 
and they coarticulate with neighbors. For exam- 
ple, the different velum heights associated with 
vowels of different heists both influence velum 
hei^t for neighboring consonants and they are 
recipients of coarticulatory influences from nasal 
consonants (Bell-Berti, 1980). Accordingly, in con- 
trast to the feature-spreading account of coarticu- 
lation, coarticulatory influences occur in the ab- 
sence of any linguistic features to spread. 

Recently, Keating (1988 a,b) has proposed an 
alternative account of specification and its role in 
coarticulation that preserves the idea of 
coarticulation as a participant in a translation 
from the mental to the physical domain of talking. 
She proposes that coarticulation includes 
processes at two levels at least, one phonological 
and one phonetic. At the phonological level, 
coarticulation is assimilatory feature spreading. 
Since Heating's focus has been on phonetic 
coarticulation, she simply alludes to this type of 
coarticulation without providing an example. 
However, a possible example is provided by 
Daniloff and Hammarberg (1973). They point out 
that in the word Nddth,* there is, apparently, a 
spreading of the interdental place of articulation 
of /q/ to /d/ (which, by the way, is gpecifUd for a 
different place of articulation; howev#T, in this 
case, the feature diange does not yield a different 
phoneme of English). As for phonetic 
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coaiticttlation, Keating proposes a ^rgets and 
connections* model. In the model, phonetic 
segments are associated with characteristic 
targets, and segments are sequenced by 
interpolating between successive targets. A novel 
aspect of Keating's idea of targets, however (but 
see Manuel 1987, for a similar idea), is that the 
teigets are regions (Svindows*), rather than fixed 
postures. Windows differ in their widths, and a 
target's instantiation within its window will 
depend on its nei^bors in that the speaker will 
generally select the most efficient path from 
segment to segment that passes through each 
target region. The idea of target windows r^laces 
the idea of underspedfication as ^categorial* with 
a gradient version. A segment with the narrowest 
possible window for some feature is '^specified* for 
that feature value; one with the widest possible 
window for a feature is unspecified. However^ 
most segments have intermediate target window 
sizes for their component features. Vowels have 
wider windows for velum height than do nasal and 
oral consonants, but the window is not as wide as 
possible. Accordingly, a vowel's window region 
does affect the articulatory path through the 
target window of neighboring segments. 

This model handles the data of coarticulation 
considerably better than does the feature 
spreading model of Daniloif and Hammarfoerg 
(1973); yet it preserves the idea of coarticulation 
as among the processes that make the planned 
utterance ^more physical.* The targets and 
connections model is not obviously consistent with 
all of the data, however. In particular, one finding 
that "^^^ model does not seem to handle well is the 
ubiquity of coarticulatory fields that extend 
beyond immediate neighbors. The targets and 
connections idea explains how contiguous 
segments can be produced smoothly, but it does 
not readily predict strong coarticulatory 
influences of a segment C on A in an ABC 
sequence. Two other problems emerge below. They 
are that some coarticulation is difficult to 
characterize as anything other than overlap (for 
example, findings by Marchal 1988, cited above). 
A second is that a segment's ^aggressiveness* 
(here, having a narrow window) in its own domain 
appears always to be associated with a compatible 
degree of aggressiveness outside of its domain, 
frequently beyond any transitional region between 
target regions. 

2J1 Coproduction theories 

A ^coproduction* theory (Fowler, 1977) explains 
coarticulation as the overlapping production of— to 



a first approximation — ^invariant sequences of 
consonants and vowels. The context sensitivity 
apparent in the acoustic signal and in articulation 
is not "deep* co;;text sensiti'iity in the sense that 
consonants or vowels hs^ve undergone assimilatory 
change (as in a feature spreading theory). Rather 
it is a more peripheral blending of consonants and 
vowels that are unchanged with respect to their 
essential, specified, properties. 

dhman's (1.966; 1967) theoiy provides a seminal 
example ot such a theory (but see also, however, 
(Kozhevnikov & Chistovich, 1965). In a spectro- 
graphic analysis of V1CV2 disyllables, Ohman 
noticed many instances in which the closing 
transitions into the consonant depended not only 
on Vi, but also on V2. Likewise, transitions 
following consonant release depended on both 
vowels. X ray tracings (see also Ohman 1967) 
showed clear evidence that the tongue body 
conformatioa during C closure was different in the 
context of different flanking vowels. Ohman (1966, 
166) suggested that the stop gestures were 
''superimposed* on a diphthongal vowel-to^vowel 
gesture of the tongue body and that the 'i^ngue is 
able to make a distorted vowel gesture, while it is 
executing the stop consonant.* More speculatively, 
he proposed three neuromuscular systems for 
controlling the tongue. The systems, though 
distinct, would use overlapping muscles. One 
system, the apical system, is used to produce 
dental, alveolar and retroflex consonants; the 
dorsal system produces palatal and velar 
consonants, and the tongue body system produces 
vowels. During speech production, a consonant 
and vowel system may be controlling the tongue in 
overlapping time frames and the result is ^a 
complex summation (neural, muscular and 
probably medianical also) of the responses to each 
of the components of the instruction.* (1966, 166) 
Ohman's observations have been replicated many 
times. For example, Perkell (1969) noticed that 
the flsJ constriction during /hdkc/ consisted of a 
sliding movement of the tongue dorsum toward 
the front location for M. Compatible evidence of 
vowel-to consonant anticipatory and carryover 
coarticulation and sometimes vowel-to-vowel 
coarticulation in VCVs is provided by Barry and 
Kuenzel (1975), Butcher and Weiher (1976) and 
Carney and Moll (1971). 

These findings are not captured naturally in a 
feature spreading account of coarticulation. The 
main reason is that they reveal the dynamic 
nature of changing articulatory parameter values 
during speech. Consider PerkelVs finding just 
described. There is no change in a feature value 
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for /kfs place of articulation that would yield a 
sliding place value. The outcome ii explained 
more naturally ai a growing influence of /efs 
articulatoiry demands during Tk/. 

Ohman (1967) developed a quantitative model of 
vowel^nsonant*vowel coarticulation that did a 
satisfactory job of predicting the changing vocal- 
tract shapes (as indexed by X-ray tradngii) during 
VCV production* Notably it includes a parameter 
value, k and other parameters labeled q, to 
implement consonant and vowel production 
respectively over time. To implement the temporal 
articulatory domain of a consonant or vowel, the 
associated parameter increases over time and 
then decreases. That is, to generate coarticulatory 
influences of the vowel on the consonant, for 
example, the vowel's influence on the vocal tract 
gradually waxes and then wanes. Elsewhere, we 
have described this waxing and waning of a 
segment's implementation over time as a 
prominence curve* (Fowler & Smith, 1986); see 
L5fqvi8t's (1990) similar idea of ""dominance*). 

In light of this evidence favoring coproduction, 
let U8 reconsider the data considered most 
supportive of feature spreading theory, evidence 
that lip rounding anticipates across consonant 
strings unspecified for rounding and that and 
velum lowering anticipates across vowel strings 
unspecified for nasality. Difficulties with the idea 
of underspecification have already been cited. 
More than that, however, work by Bell-Berti and 
her colleagues show quite convincingly that the 
error of accepting underspecification has led to 
considerable overestimation of anticipation of 
velum lowering and lip rounding (see also Boyce, 
Krakow, Bell-Berti, & (5elfer, 1990). 

22.1 Antidpatoxy lowering of the velum for nasal 
consonants 

Consider the literature on nasalization first. 
Researchers typically examined CVnN strings 
(where Ns are nasal consonants and the subscript 
on the vowel signifies that different numbers of 
vowels intervened between C and N). Velar 
lowering following C was taken as evidence for 
onset of anticipatory nasalization firom N (Moll & 
Daniloff, 1971). However, Bell-Berti (1980) points 
out that vowels are associated with lower velum 
heights than are oral consonants; accordingly the 
initial drop of the velum will be due at least to the 
vowel; it may or may not reflect an influence of N 
as well. That can be determined only by 
comparing CVnN sequences with corresponding 
CVnC sequences. Sudu a comparison indeed shows 
a lowering of the velum at the onset of a vowel 



■tring in CVnC utterances that, of course, must be 
ascribed to the vowel rather than to coarticulatory 
effects of a nasal consonant (Bell-Berti & Krakow, 
1»91). When effects of the vowel are eliminated 
ftom velum movements in CVnN utterances, 
findings are no longer consistent with feature 
spreading theories. Rather, they suggest an 
invariant onset of velum lowering relative to onset 
of nasal murmur in nasal consonant production. 
Bell-Berti and Hams (1981) interpret the findings 
as favoring a particular version of a coproduction 
theory, that they call '^rame theory* in which the 
temporally*'Staggered onsets of component 
gestures of a phonetic segment are staggered in a 
time-invariant way. 

The findings by Bell-Berti and her colleagues 
also help to explain an otherwise complicating 
finding by Bladon and Al-Bamemi (1982). Bladon 
and AI-Bamemi had found evidence for two 
patterns of anticipatory coarticulation of velum 
lowering-^ one-etep pattern of lowering, timed 
consistently with predictions of feature-spreading 
theory (that is, beginning at the onset of the first 
vowel in a string) and a two step pattern, the first 
ttep b^iiinning at the onset of ^e first vowel and 
the second, as firame theory predicts, an invariant 
interval before the oral closing gesture for the 
nasal consonant. Bladon and Al-Bamemi were 
unable to find anything ^stematically different in 
the contexts in which each pattern was observed; 
therefore, they suggested that selection among the 
strategies was unsystematic. An alternative 
interpretation, however, is that sometimes the 
vocalic velum lowering movement (always 
beginning near vowel onset) overlaps completely 
with the lowering gesture for the nasal consonant, 
whereas at other times, it follows velum lowering 
for the vowel. Bell-Berti and Krakow (1991; see 
also Boyce et al., 1990) found increasing evidence 
of two- or multi-stage velum lowering as vocalic 
segments were added before the nasal consonant. 
Likewise, of their three talkers, one produced the 
target words at a considerably faster rate than the 
ethers and that subject showed a one-stage 
lowering pattern for all but the longest vowel 
segments. Finally, one talker who produced the 
words at two rates showed two- or multi-stage 
lowering only at the slower rate. 

Overall, the findings on anticipatory velum 
lowering— originally considered to provide strong 
evidence in favor of a feature spreading theory of 
coarticulation, do not; rather, they provide better 
support for the view that coarticulation is 
coproduction. Notice, too, that Heating's targets 
and oonnections account must at least be modified 
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to fit the data. In particular, the model does not 
predict that target windows for luccessive 
segments will overlap; however, the data just 
described shows convincingly that they do. That 
is, this model too must admit the possibility of 
coproduction. Coarticulation is not wholly finding 
the most efficient pathway from one target 
window to another; sometimes windows overlap. 
2^ Lip rounding 

The literature on lip rounding, like that on 
nasalization, has failed to support the feature 
spreading account Genera)]y, it supports frame 
theory. As Kent and Minifi.^ (1977) pointed out, 
contradictory evidence was available even on one 
itudy commonly cited as supporting feature 
spreading, namely that of Benguerel and Cowan 
(1974). In their findings more than half the time, 
rounding spread not only through a preceding 
consonant string, but beyond it into a 
preconsonantal unrounded vowel. BelUBerti and 
Harris (1979) obtained similar results for both of 
their speakers. The study by Bell-Berti and Harris 
(1979) and a later one (1982) showed a generally 
invariant relation between onset of EMG 
(orbicularis oris muscle of the lips) for a rounded 
vowel and measured acoustic onset of the rounded 
vowel over a variable number of prevocalic 
consonants. 

The research by Bell-Berti and Harris tested for 
and found lip EMG activity for /I/, one of the 
consonants in the strings they used as stimuli. As 
noted earlier, other investigators have found 
rounding for other consonants. These consonantal 
influences on lip configuration are likely to have 
contaminated estimates of onset of lip rounding in 
the earlier research in the same way that the 
vocalic influences on velum height contaminated 
estimates of onset of velum lowering for nasal 
consonants. These contaminating influences can 
only be identified by examining control utterances 
that lack the specified segment (that is, VCnV 
utterances in which both vowels are unrounded), 
and investigators have not done that generally. 
However, using appropriate control utterances, 
Boyce (1988) has shown that overlapping 
consonantal and vocalic lip movements 
approximately add so that effects of consonants on 
the lips in a utterance such as /kuktluk/ can be 
eliminated by subtracting the movement trace 
from /kiktlik/ from it. Whereas Boyce did not then 
test for the invariance of EMG onset relative to 
acoustic onset of the rounded vowel that Bell-Berti 
and Harris had reported earlier, she did find a 
clear intervocalic trough in lip movement activity 
and bimodal peaks of EMG activity in utterances 



with two rounded vowels. The pair of findings 
suggests that during the consonantal string /ktl/, 
rounding from the first vowel wanes while that for 
the second vowel increases. Hence there are two 
distinct rounding gestures that wax and wane in 
the consonantal string— just as Ohman's account 
of vowel-consonant production proposed. There is 
not a spreading of a rounding feature from vowel 
to consonant. Compatibly, Gelfer, Bell-Berti, and 
Harris (1989) super-imposed graphs of lip EMG 
activity (orbicularis oris) for utterances such as 
^stitu/ and /ist#ti/ having varying n\nnbers of 
intervocalic consonants and final /u/ or /if. By 
eliminating the activity common to both 
utterances, and hence due to the consonant string, 
they were able to identify the onset time of EMG 
activity associated with the rounded vowel itself. 
Onset times bore a near-invariant relation to 
release of the occlusion of the final consonant in 
strings of two or more consonants. 
2^ lingual coarticulation 

The literature on coarticulation involving the 
tongue supports and augments the idea of 
coarticulation as gestural overlap. Ohman's model 
suggests that demands on the articulators made 
by a segment increase gradually over time and 
decrease gradually. The serial ordering of 
segments in articulation is maintained not by 
preserving discreteness of segment production 
along the time axis, but, rather perhaps, by 
maintaining a eerial ordering of their times of 
maximum control in the vocal tract. In addition, 
however, segments differ one from the other in the 
strengths of demands they place on different 
articulators (or on different articulatory systems; 
see below under Coordination* and cf. Keating's 
idea of windows discussed above). The differences 
in strength have an observable consequence that 
is described differently (e.g., Fametani, 1990) 
depending on where it is observed. If discrete 
domains are identified for segments in an 
utterance by drawing boundaries at points where 
coarticulatii.g segments shift in their relative 
dominance in the vocal tract, then one can say 
that in their own domain, segments that make 
strong demands on an articulator ^resist" 
coarticulatOTy influences from neighbors (Bladon 
& Al-Bamemi, 1976); in the domains of near 
neighbors, they exert a strong coarticulatory 
influence. From the perspective of a coproduction 
theory, resistance to coarticulation and a strong 
coarticulatory influence oovary because they are 
really the same thing — namely a segment's 
exerting a relatively strong influence on 
articulators throughout its temporal domain. 
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Recuens (1984; 1986; 1S87; in preii) hmM cm- 
ducted muA of the wiirk that hM uncow^d vari^ 
ation in coarticalation reaiitance in movemenU 
invohong the tongue doraum. In general, reeii- 
tance to coarticalation of a eonionant or vowel it 
associated with the amount of tongue dorsum- 
palatal eontact associated with production of the 
segment (see also Fametani, 1990X Compatibly, 
using acoustic and electropalatographic measures, 
Recasens (1984; 1987) found a decrease in vowel- 
to*voweI coaiticttlatiim in VC^ sequences in which 
C is produced with considerable contact between 
the tongue dorsum (an important articulator in 
vowel production) and the palate- For example 
there is less V-to-C coarticulation across palatal ^ 

than across denUl /n/. Compatibly, the vowel ^, 
which requires a constriction in the palatal region 
resists omsonant^vowel ooarticulatoiy influances 
more so than do other vowels (Recasens, 1985), 
and it resists vowel-to-vowel coarticulatory over- 
lap as well (Recasens, 1987; in press). In addition, 
as noted earlier, segments such as /i/ that are re- 
sistant to coarticalation in their own coarticula- 
tory domains themselves exert strong coarticula- 
tory influences on neighbors (see Tables II-VI in 
Recasens, 1987; see also Butdier & Weflier, 1976; 
Fametani, Vagges, & Magno-Caldognetto, 1985). 

It may be tempting to conclude from this re- 
search that production of consonants and vowels 
is context sensitive after all in that coarticulatory 
anticipation of V2 in a VCV sequence must be de- 
layed an d reduced if VI is ^ as compared to /a/ or 
if C is )3/ as compared to /n/. However, possibly, 
the planned segment can be invariant, while its 
surface manifestations vary according to its 
neighbor's patterns of coarticulation resistance. 
Consider, by analogy, the different surface conse- 
quences of an invariant squeezing action of the 
hand depending on whether the hand is empty, or 
else holding a sponge or a rock. The outcome at 
the surface is different both in the extent to which 
the hand (metaphorically, the segment being pro- 
duced) closes and in the extent that it deforms the 
sponge (a little coarticulation resistance) and the 
?odc (a lo4 of resistance). Perhaps by the same to- 
ken, an invariant plan for a segment can have 
different surface conse q uen c es if coarticulation re- 
sistance is implemented as a real physical vari- 
able in the vocal tract. There is one striking out- 
come reported by Recasens (1984) that suggests 
exactly tfaaL He reported instanoee both of antici- 
patory and of carryover coarticulation in which 
coarticulatoty effects were discontinuous. Iliat is, 
vowel-to-vowel effecU were observed in VCV se- 
quences even thou^, consonants with oonsid- 



erable tongue dorsum/palatal contact, vowel-to- 
consonant coarticulation was absent It is unlikely 
that talkers plan to b^in production of V2 in VI, 
to stop production of V2 during C, and to recom- 
mence its production after C. An analogous plan 
for carxyover ooarticulation is even lees likely. 

23 Some tentative conclusions about 
coarticulation 
The findings just reviewed suggest the following 
summary. Each consonant or vowel of the 
language is implemented by one or more vocal- 
tract actions. Actions are of two varieties: gestures 
(Browman Goldstein, 1986) that are 
linguistically significant (and contrastive) and 
other, noncontrastive, ones that may occur 
because they are easier to produce than to 
suppress. Grestures for a segm«it may be timed or 
phased invariantly one with respect to anoth er as 
frame theory proposes. Each vocal tract gesture 
has a prominence pattern of increasing then 
decreasing articulatory strength, where 
prominence refers to the extent to which the 
gesture exerts an influence on the character of 
movements in the vocal tract Vocal tract actions 
differ one from the other in relative strength so 
that, for example, demands of ^ or ^ on the 
tongue dorsum-palate relation exceed those of M/ 
and /a/. The extent to which a segmen^specific 
action influences what is happening in the vocal 
tract at any point in time reflects the strength of 
that action and its strength relative to that of 
other ongoing actions affecting the same vocal- 
tract structures. 'Strength* appears to be 
in^ilemented in such a way that its effecU arise at 
the articulatory surface, not in differential 
planning for a segment depending on its context. 
The account is incomplete in a variety of ways, 
lacking detail in important areas, including a 
specification of how strength variations are 
realised. It is also too simple in some respects. In 
particular, patterns of relative timing of gestures 
for a segment are not invariant— they may vary 
over position in a syllable as Krakow (1989) has 
shown for the relative timing of velum lowering 
and lip closing actions for syllable-initial and 
•final /m/. They are likely to vary over stress and 
rate manipulations as well. In short, the state-of- 
the art in coarticulation research leaves 
investigators still with many problems to tad^le. 

3. COORDINATION 
From the perspective of a coarticulating seg- 
ment encroadiing on the domain of a second seg- 
mex^ the second segment applies restrictions on 
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where and to what extent encroadiment can occur 
(^coarticulation renstance*). Accordingly, coarticu- 
lation by the tame segment in the same 
(antidpatoxy, carryover) direction will be differ- 
entially manifested depending on the nature and 
strength of the restrictions applied in its coarticu- 
latoty field. Looked at from the perspective of the 
influenced segment^ however, the restsrictions axe 
the segment's own identity; they are actions or 
postures the achievement of whidi counts as pro- 
duction of that segment. Somehow realization of 
the segment correspondingly prohibits contradic- 
tory actions. Here I examine implementation of 
those restrictions in speech production. 

The vocal tract includes large numbers of mus- 
cles and structures that the muscles move or de- 
form. Relative to the catalogue of movements that 
could occur were contractions of all possible com- 
binations of vocal tract muscles used and contrac- 
tions of all possible magnitudes, the movements 
that do occur in speech are limited in number and 
in kind. They are constrained, of course, to struc- 
ture the air so that listeners can hear them. But 
more than that, they are low-dimensional move- 
ments — ^movements with order that spans groups 
of muscles and groups of vocal-tract structures. 
They are, indeed, coordinated actions. 

Coordination achieves several things. Most im- 
portantly, structures of the vocal tract work to- 
gether to achieve some end. For example, in pro- 
duction of /h/, the jaw and lips work together to 
achieve bilabial closure. The couplings among 
structures also preclude actions that violate the 
couplings; thereby they prohibit coartdculatory in- 
fluences that would prevent the goal of the coordi- 
native linkages. They do not completely eliminate 
variability or flexibility, however. For example, bi- 
labial closure is realized with a variety of contri- 
butions by the jaw and lips. When /b/ is coarticu- 
lated with an open vowel, the jaw is lower during 
closure, and hence the lips do more of the closing 
work, than when /b/ coarticulates with /i/. 
Research using a perturbation procedure (e.g.. 
Abbs & Gracco, 1984); (Kelso, Tuller, Vatikiotis- 
Bateson et al., 1984; Shaiman, 1989) helps to 
expose couplings across structures of the vocal 
tract. In one of these experiments, Kelso, Tuller, 
Vatikiotis-Bateson, and Fowler (1984) asked 

talkers to produce ^tf s a again,* with /baeb^ or 

/baez/ serving as target syllable. On a low 
proportion of trials, randomly selected, during the 
closing gesture for the second th/ in /baeb/ or for 
the /z/ in /baez/, the talker's jaw was unexpectedly 
braked, preventing its normal contribution to 
closure for the consonantal constriction. On 



perturbed relative to unperturbed trials, within 
20-30 ms of the perturbation in /baeb/, the 
orbicularis oris muscle of the upper lip showed 
extra activation and by achievement of closure, 
the lip had moved farther down than on 
unperturbed trials. If the jaw was braked during 
closing for /z/» extra activation was observed in the 
genioglossus muade of the tongue allowing the 
tongue to oompenaate for the unusually low posi- 
tion of the jaw. The upper lip did not show the 
same extra downward movemenw on /z/-perturbed 
trials that it showed on /h/-perturbed trials. Other 
research (Shaiman, 1989) ^ows that when an ar- 
ticulator of the vocal tract is perturbed that is not 
involved in a consonantal closing gesture, closing 
on perturbed and unperturbed trials is alike. In 
short, the responses to perturbation are adaptive 
and they reveal a coupling among selective articu- 
lators of the vocal tract that jointly achieve some 
phonetic gestural end. Coupled structures and 
their neuromuscular underpinnings are know as 
"'synergies* or '^coordinative structures.* Whereas 
Ldfqvist (1990) suggests that there are no dy- 
namic perturbations in speech analogous to a jaw 
pull, perhaps there are. Coarticulatory encroach- 
ments from low vowels can perturb a talker's jaw, 
pulling it down during closure for /b/. Possibly, 
thm, the couplings serve two functions; they bring 
about the coordinated action that constitutes a 
linguistic gesture of the vocal tract, and they per- 
mit only those coarticulatory encroachments that 
will not prevent the gesture from being realized. 

The short^latenpy responses to the perturbations 
suggest that the couplings are low- level. That is, 
they are not cognitive couplings, but, rather 
neuromuscular onek^. This may help to rationalize 
findings by Recasens summarized earlier of 
discontinuities in coarticulatory influences. 
Whereas it would be surprising for speakers to 
plan for V-to-V coarticulatory influences, yet plan 
for no V-to-C influences in a VCV sequence, the 
finding of discontinuities in coarticulation is less 
surprising if segments are planned to have an 
invariant coarticulatory field that then gets 
differentially suppressed by other synergies active 
in the vocal tract 

Following Browman and Groldstein (1986; 1989), 
we may call the vocal tract actions of a synergy a 
'phonetic gesture* or, more simply, a ""gesture.* 
Phonetic gestures are, then, linguistically signifi- 
cant actions of the vocal tract In the r#^search us- 
ing the perturbation technique just dfe^cribed, per- 
turbations disrupted movements by one articula- 
tor among two or more that participated in a pho- 
netic gesture. That is, perturbations and compen- 
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sationi wen interarticalatory, but mtragertural. 
However, loiae i^oaetic seemento «re defined by 
more than one ffectore, and the timing or phasing 
between or among ceetoree may alio be erocia) to 
the identity of the segment* For example, the 
timing of an oral constriction gesture and a glottal 
devoicing gesture detennines whether a consonant 
is preaspirated or aspirated (see, e.g., L»f<|vist, 
1980); Lfifqvist & Yoshioka, 1984). Presumably, 
then, intrasegmental gestures must be coupled 
and one should see evidence of the coupling in 
perturbation experiments. To date, there is little 
evidence on the topic 

However, Munhall, Ldfqvist, and Kelso (1988) 
have pertuAed the tower Hp during dosing from a 
vowel to a /p/. The perturbation delayed achieve- 
ment of closure, thereby lengthening the vowel. 
However, onset of glottal opening for /p/ was also 
delayed, giving rise to a perceptually adequate 
aspirated (Even so, there was disruption of ^ 
coordinative relation between the gestures sudi 
that the voice^mset times cm pertuxbed trials were 
unusually long.) 

Another indc^, perhaps, of a coupling relation 
between the gestures of a segment is provided by 
tests for invariant relative timing (as summarized 
in L6fqvist 1990). Coupling between gestures of a 
segment should give rise to invariance of relative 
timing bev';;ireen the gestures so that, as the seg- 
ment is produced at various rates or with different 
levels of stress, temporal intervals between ges- 
ture onsets scale proportionately to dianges in 
other intervals produced by the coupled actions. 
(The idea is that if the gestures are products of a 
common synergy, and rate changes are adiieved 
by changes in a parameter that is common to the 
synergy, all temporal intervals produced fay the 
gestures will scale proportionately.) Ldfqvist 
(1990) applied a test for proportionality of inter- 
vals pressed by (Sentner (1987) to several lets of 
data including measures of intrasegmental- in- 
tergestural intervals and intersegmental- in- 
tergestural intervals. Whereas 90% of tests for 
proportional dianges in intervals over variation in 
rate and stress were rtdected in tests of the latter 
intervals, just 33% were r^iected in tesU of the 
former intervals. L5fqvist does not consider this 
particularly strong support for the proportional- 
durational test of coupling between gestures of a 
segment, because the reason why 67% of tests 
failed to niject the hypothesis of propor^onal du- 
rations for intrasegmental*inteigestural intervals 
was not that intervals were relatively invariant, 
but rather because they were extremely noiqr (see 
his Figures 11-16). Even so, his data do reveal 



marked differences in the temporal relations 
amoQg gesture belonging to the same and to dif- 
ferent phonological segments, with the latter rela- 
tions showing systematic departures from the 
proportional-duration hypothesis and the former 
showing only unsystematic departures. 

4. SPEECH DYNAMICS 
Hciere is a new development in the study of 
speed! production that I will describe only briefly. 
It is as yet relatively untried; however, it promises 
to have a nuffked influence on research in the 
field. Although speedi production is remarkable 
as a motor activity, it is not wholly unique. Some 
common issues arise in investigations of a variety 
of intentional motor skills. More fundamentally, 
however, some theorists suggest that intentional 
actions in general (Kugler & Turvey, 1987); 
Kugler, Kelso, ft Turvey 1980) and speech 
production in particular (Saltzman, 1986; 
Saltxman & Kelso 1987; Saltzman et al., 1989; 
Kelso & Tuller, 1984) constitute a special instance 
of "self-organisation"' in physical systems. 
Accordingly, they may be best understood by 
embedding their investigation in the larger 
context of the study of self-organizing physical 
systems. Complex physical systems that are open 
to the flow of energy from the environment, 
whether they are living systems or not, develop 
macroscopic, low dimensional patterned and 
stable activities that can be modeled as attractors 
of just a few sorts. Most simply, a physical system 
can be modeled as a V>int attractor* if, when 
perturbed, it tends to return to the same final 
target— much as the vocal tract does if it is 
perturbed during bilabial closure (e.g., Saltzman 
& Kelso, 1987). 

Saltzman and colleagues have shown that many 
central features of speech production — ^including 
adaptive responses to perturbationB and conse- 
quences of ooarticulatory overlap (see Saltzman & 
Munhall, 1989) can be modeled if phonetic ges- 
tures a?e modeled as dynamical systems. On the 
other side, Tuller and Kelso (1990) have shown 
that speech production exhibits some of the cen- 
tral diaracteristic features of dynamical systems. 
Finally, Browman and Croldstein (1986) have de- 
veloped an "articulatory phonology^ whose primi- 
tive units, phonetic gestures, are defined by dy- 
namical parameters of the vocal-tract point at- 
tractors of Saltsman's articulatory (*tosk-dy- 
namicf*) model. Possibly, embedding the investiga- 
tion of speedi production in the context of studies 
of complex open physical systems generally will 
help to deepen our understanding of synergies and 
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their achiavemant of low-dimensional, coordinated 
actions. In tum» understanding of these physical 
systems may literally add substance to the lin- 
guist's eoncepts of phonological segments and 
their features. 
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Characteristics of Speech as a Motor Control System"^ 



Vincent L. Gracco 



The structural and functional organization of any bioph}^cal system provides p>otentially 
important information on ttie imderlying control structure. For ^>eedi^ the anatomical and 
physiological components of the vocal tract and the apparent functional nature of speech 
motor actions suggest a characteristic control structure in which ibe entire vocal tract can be 
viewed as the smallest functional unit. Sounds are coded as different relative vocal tract 
conAgiuations generated from neuromuscular speciAcations of characteristic articulatory 
actions. Sensorimotor processes are applied to ^e entire vocal tract to scale and sequence 
changes in vocal tract states. Sensorimotor medumisms are viewed as a means to predictively 
adjust speech motor output in the face of continuously dianging peripheral conditions. An 
tmderlying oscillatory process is hypothesized as the basis for sequential speedi movement 
adjustments in which a centrally-generated rhythm is modulated according to internal (task) 
reqtiirements and the constantly dianging oonfigurational state of the vocal tract. 



Speaking is a complex action involving a 
number of levels of organization and repre- 
sentative processes. At a cognitive level, speaking 
represents the manipulation of abstract symbols 
through a synthesis of associative processes ex- 
pressed through a sophisticated lingmstic struc- 
ture. At a neuromotor level, at least seven articu- 
latory subsystems can be identified (respiratory, 
laryngeal, pharyngeal, lingual, velar, mandibular, 
and labial) which interact to produce coordinated 
kinematic patterns within a complex and dynamic 
biomechanical environment. At an acoustic level, 
characteristic patterns result from complex aero- 
dynamic manipulations of the vocal tract. The 
cognitive, sensorimotor and acoustic processes of 
speech and their interactions are critical compo- 
nents to understanding this uniquely human be- 
havior. As the interface between the nervous sys- 
tem and the acoustic medium for speech produc- 
tion/perception, speech motor processes constitute 
a direct link between higher level neurophysiolog- 
ical processes and the resulting aerody- 
namic/acoustic events. 
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In the following chapter, characteristics of the 
speech motor control process will be evaluated 
from a functional perspective emphasizing the 
structural and functional organization of the vocal 
tract and the timing characteristics associated 
with their continuous modulation. In contrast to 
perspectives which emphasize the large numbers 
of muscular/kinematic degrees of freedom, the 
current perspective is one that assumes that the 
overall vocal tract is the smallest unit of func- 
tional behavior. Sounds are encoded according to 
characteristic vocal tract shap^^s specified neuro- 
muscularly and modulated through sensorimotor 
mechanisms to adapt to the constantly changing 
peripheral environment. Examination of the 
structural components and their interaction is 
consistent with this macroscopic organization as 
are a number of empirical observations. The func- 
tional organization is implemented by a limited 
number of sensorimotor control processes that 
scale overall vocal tract actions spatiotemporally 
within a frequency-modulated rhythmic organiza- 
tion characteristic of more automatic, innate mo- 
tor behaviors. 

Structural Properties 

In order to describe speech from the perspective 
of a motor control system, a necessary step is to 
identify the components of the motor system to 
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determine how their structural properties mwy 
reflect oa the overall functional oivanixation. Thc^ 
structures of the vocal tract include the lungs, lar- 
ynx, pharynx, tongue, lips, jaw, and velum. 
Anatomically the vocal tract structures display 
unique muscular architecture, muscular connec- 
tions, and muscular orientation that determine 
their potential contributions to the speedi produc- 
tion process. For example, the orientation ci the 
muscles of the pharynx, primarily the phaiyngeal 
constrictors, is sudi that they generate a sphinc- 
teric action on the long axis of the vocal tract pro- 
ducing a dmnge in the cross-sectional area and 
the tension or compliance the pharyngeal tis- 
sues. The musdes of the velum are onented pri- 
marily to raise and lower the soft palate separat- 
ing the oral and nasal cavities. Perioral muscles 
are arranged such that various synergistic muscle 
actions result in a number of characteristic 
movements such as opening and closing of the oral 
cavity and protruding and retracting the Hps* 
Some of the components, such as the Umgue and 
larynx, can be subdivided into extrinsic and in- 
trinsic portions each of which appear to be in- 
volved in different functional actions. Intrinsic 
tongue muscle fibers ara oriented to allow fine 
grooving of the longitudinal axis of the tongue and 
tongue tip and lateral adjustments dmracteristic 
of liquid and ccmtinuant sounds. Extrinsic tongue 
muscles are arranged predominantly to allow 
shaping of the tongue mass as well as elevation, 
depression and retraction of portions of the 
tongue. Intrinsic laryngeal muscles are arranged 
to open and dose the glottis redprocally and ad- 
just the tension of the vibrating vocal folds, yffbSle 
extrinsic laryngeal muscles are oriented to dis- 
place the entire laryngeal complex (thyroid carti- 
lage and associated intrinsic muscles and liga- 
ments). Generally, movements of the vocal tract 
can be classified into two mioor categories; those 
that produce and release constrictions (valving) 
and those that modulate the shape or geometer of 
the vocal tract The valving and sh^>ing actions 
are generally assodated with the production of 
consonant and vowels sounds, respectively 
(Ohman, 1966; PerkeU, 1969). 

In ad<htion to the structural arrangement of the 
vocal tract muscles for valving and shaping ac- 
tions, mechanical properties of individual vocal 
tract structures provide insist into the functional 
organization of the speedi motor control system. 
The djrnamic nature of the tissue load against 
which the different vocal tract muscles omtract is 
extremely heterogeneous. For some structures 
sudi as the hpu and vocal folds, inertial considera- 



tions are minimal, while for the jaw and respira- 
tory structures inertia is a significant considera- 
tion. Hie tongue and lips are soft tissue structures 
that undergo tubfftfintfy^ viscoelastic deformation 
during speedi while the jaw and perhaps the lips 
display a degree of anisotropic tension (Lynn & 
Yemm, 1971). Even seemingly homogeneous 
structures sudi as the i4>per and lower lips, dis- 
play different stifibess properties (Ho, Azar, 
Weinstein, ft Bowley, 1982) posaibly contributing 
to their differential movement patterns (Gracco & 
Abbs, 1986; Gracco, 1988; Kelso et al., 1984). 
Considering the structural arrangement of the vo- 
cal tract, the different muscular orientations and 
the vast interconnection of muscles, cartilages, 
and ligaments it is dear that complex biomechani- 
cal interactions among structures are the rule. 
Passive or reactive dianges in the vocal tract due 
to inherent ^ik?**9"'<^^ coupling is a consequence 
of almost any vocal tract action, with the relative 
significance varying according to the spedfic 
structural components and conformational change 
and the speed at wfaidi adjustments occur. As a 
result, a single articulatory action may generate 
primary as wdl as secondary effects diroughout 
the vocal tract. The examination of individual ar- 
ticulatory actions are important to determine 
their contribution to the sound producing procais. 
However, individual articulatory actions never 
have isolated effects. Th^ combination of the vis- 
coelastic properties of the tissues, the different 
biomedianical properties of vocal tract structures, 
and the complex geometry of the vocal tract com- 
prise a complex Inomedianical environment The 
kinematic and acoustic variability diaracteristic 
of speedi production reflects in part the differen- 
tial filtering of neural control signals by the pe- 
ripheral Uomedumics. Only through detailed bio- 
physical models of the vocal tract and considera- 
tions of potential biomechanical interaction asso- 
dated with various phonetic environments can the 
control prindples of the speech motor control sys- 
tem be separated from structural or cogni- 
tiveAinguistic influences. 

Functional Oiganization 

In order to diaracterize the speedi motor control 
system accurately, and pose the motor control 
problem correctly, it is important to determine 
how the behavior is being regulated. That is, are 
the individual sound-influencing elements being 
independently controlled or does the control struc- 
ture involve larger units of behavior, and if so, 
what is the organisational structure? For speech, 
the simple observation that even an isolated vowel 
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found requires actfHty in retpiratoiy muscles, 
tension and adduction of the vocal folds adjust- 
ments in the compliance of the oropharyngeal 
walls^ shaping of tiie tongue, positioning of the 
jaw, elevation of the velum, and some Up configu- 
ration is rather convincing evidence that speech is 
functionally organised at a level reflecting the 
overall state of the vocal tracL It ia the interaction 
of all the neuromuscular components that provide 
each speech sound with its distinct cha r acter, not 
the action of any single component* The often- 
cited fact that speech production involves over 70 
different muscular degrees of freedom, while per- 
haps anatAmically factual, is a functional misrep- 
resentation of the motor control system organiza- 
tion. As early as the birth cry and through the 
earliest stages of speech development, the infants 
vocalizations involve the coq[>erative action of res- 
piratory, laryngeal, and supralaiyngeal muscles to 
produce sounds. A similar observation can be 
made for locomotion in that rhythmic stepping 
and other seemingly functional locomotion-like 
behaviors , can be elicited well before the infant 



manifesU upright walking Clhelen, 1985, 1986). It 
appears that functional characteristics of many 
human behavioiv are present at birth or very 
early in the infants development suggesting that 
the ''significant functional units of action* 
(Greene, 1972) may be innate properties of the 
nervous system. It is suggested that speech motor 
development reflects the ability to make finer and 
more varied adjustments of the vocal tract, not the 
mastering of the articulatory or muscular degrees 
of freedom. 

As suggested above, the characteristics of 
speech as a motor control system include a control 
structure in whidi the smallest functional unit is 
the entire vocal tract. Recent studies have 
demonstrated examples of large scale 
manipulation of vocal tract actions rather than 
the modulation of separate articulatory actions. 
As shown in Figure 1, movements of individual 
articulators such as the upper lip, lower lip, and 
jaw demonstrate timing relations such that 
adjustments in one structure are accompanied by 
adjustments in all functionally-related structures. 




Figuft L Upper L p (UL), Lower Lip (LL), and Jaw (P movcmtnt vtloddta associated with the first ''p" closing in 
''sapapplt.^ .4s H/t preceding vowel duratiofi changat, die timing of the UL^ LU and J changt in a conaittent and 
unitary aumwT (fnm Gracco, IMS). Calibalion has art SO mmlB^c (vtrtical) and 100 ms (horizonlal). 
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The eooTdinathre pToceti reOecto m constraint on 
articalatoiy mctiMM mvolvad in the production of a 
specific sound. Similar results can be observed for 
other £iore spatially remote, but functionally 
related artaoulators* As shown in Figure 2, move- 
ments of the larynx and the lower lip demonstrate 
a similar timing dependoiQr for the production of 
the in 'safety** In order to generate the frica- 
tion noise diaracteristic of the /C^, the glottal 
opening and laUal constriction is appropriately 
timed* As the timing of one structure changes, the 
^rtiing of the other functionally-related articula- 
toiy action also dianges. Similarly, for movements 
atS4>risted wi& resonance producing vowel events, 
tinning constraints can be observed between 
laryngeal voicing and jaw opening assodated with 
tongue positioning for a vowel (Figure 3). Here, 
the laryngeal action associated with phonation 
and the diange in jaw positioning to assist the 
tongue in vowel production demonstrate similar 
coordinative interdependenpy. Some preliminary 
evidence further suggests that certain physiclogi- 
cal changes associated with the production of em- 



phatic stress results in an increase in the actions 
of all portions of the vocal tract rather than being 
focused on one spec^ articulator (Fowler, Graoco, 
k V.-Bateson, 1989X In the presence of a poten- 
tially disruptive medianical disturbance applied 
to one of the contributing articulators there is a 
tendency for the timing of all articulators to 
readjust (Gracco & Abbs, 1988). The timing of in- 
dividual articulators is apparently not adjusted 
singularly but reflects a system level organization 
(see L5fqvist & Yoshioka, 1981; 1984; TuUer, 
Kelso, & Harris, 1982; for other examples). It is 
not dear how general these observation are with 
regard to all speech sounds in all possible 
contexts. For example, the lip^aw and laryn- 
geal/supralaryngeal coordination observed in 
Figures 1 and 2 is modified when the sound is at 
the beginning of a word apparently reflecting a 
change in the functional requirements of the task. 
The importance of these kinds of observations is 
not the specific observable pattern but the r^es- 
ence of characteristic patterns that are used for 
time-dependent articulatory adjustments. 
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LIP 



GLOTTIS 




2 mm 
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FigurtZ. Uwcr Up dodng and glottal opmi»gmov«n«m«forliii««i«prtiaon«tff Hit wid-M^- A»ttwlow«rlip 
clMing novcMnt ft 'r vuim, the lining of the glottal opening (dcvoidng) atoo variM {fnm Gneeo k. Ufqvut, 
im). Siadlar la Fi^ 1, ttM tefatg of Iht end and laiyngMl actiom appwtf to bs ad)«M^ 
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Figure 3. Timing ftlations b€twt«n tiie glottal dofing and (he jaw opening associated with the vowel in ''sip.'*' As the 
glottal opening/dosing associated with the and subsequent vowel varies, the jaw opening (noted by the 
downwaid movement) also varies proportionally (Iran Gncco 4c L5f qvist; 1989). 



Speech motor patterns reflect characteristic 
ways of manipulating the vocal tract, in the pres- 
ence of a constant pressure source, to generate 
recognizable and language-specific acoustic 
signals (Ohala, 1983). The process through which 
such functional cooperation occurs has been de- 
scribed for many motor tasks in various contexts 
with the assumption that the control actions in- 
volve the assembly of functional units of the sys- 
tem organized into a larger systems known as 
synergies or coordinative structures (Bernstein, 
1967; Fowler, 1977; Fowler, Rubin, Remez, & 
Turvey, 1980; Gelfand, Gurfmkel, Tsetlin, & Shik, 
1971; Saltzman, 1979; 1986; Turvey, 1977; Kugler, 
Kelso, & Turvey, 1980; 1982). In keeping with the 
interactive structural configuration outlined pre- 
viously and the apparent functioiial nature of the 
task itself, a modification of this penqpective is of- 
fered. Speaking appears to involve coordinative 
structures (or synonymously motor programs; see 
Abbs, Gracco, k Cole, 1984; Gracco, 1987) 
available for all characteristic vocal tract actions 



associated with the sound inventory of the 
language. It is not the case, however, that a 
coordinative structure or a motor program is a 
process but a set of sensorimotor specifications 
identifying the relative contribution of the vocal 
tract structures to the overall vocal tract 
configuration (see Abbs et al., 1984; Gracco, 1987). 
As such, coordinative structures may be more 
rigidly-spedfied than previously thought and the 
distinction between a flexible coordinative 
structure and a hard-wired motor program 
algorithm may be more rhetorical than real (cf. 
Kelso, 1986 for discussion of differences). In this 
regard, two observations are of note. When the 
contribution of jaw movement is eliminated, by 
placing a block between the teeth, jaw closing 
muscle actions are still present (Folkins & 
Zimmermann, 1981). Further, in response to jaw 
perturbation, both functionally-specific responses 
and non-functional responses are observed such as 
upper lip muide increases when the sulgects are 
not producing sounds requiring upper lip move- 
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meat (Kelw TuUer, V.-Batawm, Fowler, 1984; 
Shaiman, 1989). Together, these oheervatioiis 
reflect on tpedfic aspects of the tpeedi motor 
co£ttrol process and suggest that speech 
production may rely to some degree on fixed 
neuromuscular ^>ecificationB. The presence of jaw 
muscle actions when the jaw movement is 
eliminated is consistent with the previous 
suggestion that speech motor control is a wholistic 
process involving the entire vocal tract. The 
presence of upper Up muscle increases (albeit 
small) when the sound being produced does not 
involve the upper lip, r^ects on the underlying 
control process. The interaction of the phasic 
stimulus (from the perturbaticm) with activated 
motoneurons will produce the fu ncta o nall y«specific 
compensatory response. If the motoneurons are 
inactive, or slightly active, the phasic stimuU 
would result in small increases in muscle active* 
tion levels without any significant movement 
changes. This is a much simpler control scheme in 
that certain interactions and functionally-specific 
responses are a consequence of the activation of 
specific muscles and the actual synaptic 
interactions of various vocal tract structures 
(Gracco, 1987). The advantage of this perspective 
is that certain properties of speech production 
result from the physiological organization and 
focus the functional organixation of the speech 
motor control system on the neural coding of 
speech sounds and the diaracteristie sensorimotor 
processes that modulate and sequence vocal tract 
configurations. 

Neural Coding of Speech Motor Actions 

The codixig of speech is viewed as the process by 
which overall vocal tract states are "^presented 
and transformed by the nervous system" (see 
Perkei & Bullock, 1968). This coding is similar to 
what has previously been identified as the selec- 
tion of muscular components associated with a 
specific motor act (cf. Evarts, Biiii, Burke, 
DeLong, & Tkach, 1972. In the following, the 
selection of characteristic vocal tract states will be 
evaluated with respect to two components of the 
hypothetical specification process although the 
actual neural coding is viewed as a single process 
and is only presented separately for the purpose of 
clarity. As stated previously, the actions of the 
vocal tract are designed to either valve the air 
stream for different consonant sounds or to shape 
the geometry of the vocal tract for different vowel 
and voweMike sounds. Considering the place of 
articulation for vowels and consonants naturally 
results in categorical distinctions which are 



apparent acoustically and aerodynamically 
(Stevens, 1972). However, rather than 
dichotomizing these apparently discrepant 
processes, it is suggested that valving and shying 
can be conceptualised as a single physiological 
process. Hist is, speech sounds are coded accord- 
ing to overall vocal tract states which include pri- 
mary articulatory synergies. When the appropri- 
ate musdee are activated, the resulting force vec- 
tors create characteristic actions resulting in vocal 
tract states which act to valve the pressure or 
change the geometry without creating turbulence 
producing constrictions. It is the orientation of the 
activated muscle fibers, the activation of synergis- 
tic antagonistic muscles, and the fixed bound- 
aries of the vocal tract (the immobile maxilla) that 
result in the adiievement of characteristic shapes 
or constriction locations; certain muscular syner- 
gies can only result in certain vocal tract configu- 
rations. For example, selection of certain upper 
and lower lip muscles (orbicularis oris inferior and 
siq)erior, depressor anguli oris, mentalis, depres- 
sor labii inferior) will always result in tiie approx- 
imation of the upper and lower lips for or 
The magnitude or timing of the individual 
muscle actions may vary, but bilabial closure will 
always involve the activation of upper and lower 
lip musdes; otherwise bilabial dosiire could not be 
attained. Similarly, changing the focus of neural 
activation to regions representing lower lip mus- 
des (orbicularis oris inferior and mentalis with 
primary focus in mentalis) results in movements 
consistent with labiodoital constriction for Y and 
V* adiieved against the immobile maxillary in- 
dsors (Folkins, 1976). Different relative contribu- 
tions of extrinsic and intrinsic tongue muscles re- 
sult in various shapes and movements on the 
tongue tip, blade and body resulting in diaracter- 
istic constrictions or shapes as a consequence. 
Constriction location and constriction degree are 
usefiil categories to describe different speech 
sounds because they specify what is distinctive to 
eadi phonetic segment. Control over the vocal 
tract configuration through the development of 
finer control over the neuromuscular organixation 
provides a more reasonable description of the 
speedi acquisition process becauise the entire vocal 
tract is manipulated not just the distinctive at- 
tributes for each sound. The neuromotor differ- 
ent^ in consooani and vowel sounds s^pear to be 
reflected in other characteristics of the control 
process. 

One sudi duuacteristic involves the compliant 
states of the vocal tract consistent with the level of 
tension in the tissue walls. The importance of 
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tiuue oompliance can be inferred from a number 
of observations. A miijor physical difference 
between voiced and voicelees c on s o nants is in the 
level of air pressure associated with their 
production. Voiceless sounds are generally 
produced with hi^er vocal tract pressures than 
their voiced counterparts. The pressiire difference, 
which has significant aerodynamic and acoustic 
conse<pimceS| results from changes in the tension 
in the idiaiyngeal and oral cavities as well as from 
pressure from the lungs (Muller & Brown, 1980). 
For example, subjects engaged in producing 
speech while simultaneously engaged in a 
valsalva maneuver (forceful closing the glottis 
thereby eliminating the lung contribution) were 
able to maintain voiced/voiceless intraoral 
pressure differences apparently resulting from 
changes in the overall compliance of the vocal 
tract walls (Brown & McGlone, 1979). Together 
with experimental evidence that kinematic and 
electromyographic characteristics of lip and jaw 
movements are insufficient to differentiate voiced 
and voiceless sounds (Lubker & Parris, 1970; 
Harris, Lysaught, & Schvey, 1965; Fromkin, 
1966), it appears that a miQor factor in generating 
voicing and voicelessness is the specification of 
overall vocal tract compliance. Two possible 
compliant states of the vocal tract are sufficient to 
categorize most speech sounds; low compliance 
associated with voiceless consonants and high 
compliance associated with voiced consonants and 
vowels. Compliant states of the vocal tract are 
associated with gross changes in the activity of at 
least the pharyngeal constrictors as has been 
observed (Minifie, Abbs, Tarlow & Kwaterski, 
1974; Perlman, Luschei, & DuMond, 1989) and 
possibly other portions of the walls of the vocal 
tract (intraoral cavity). The specification of low 
compliance (resulting in high vocal tract 
pressures) would be associated with increased 
activity in larygneal muscles to assist in the 
devoidng gesture, and hi|^ compliance (resulting 
in low vocal tract pressures) would be associated 
with a relaxation of the muscle activity in the 
pharyngeal and oral cavities to allow cavity 
expansion for voiced stops and continuants (Bell- 
Berti & Hirose, 1973; Westbury, 1983; Perkell, 
1969). Certain tense vowels may result fri)m an 
intermediate level of compliance (between high 
and low) such that voicing is maintained but 
overall compliance is slightly hi^^er than for lax 
vowels. It is important to note that modification in 
compliance is a process that produces a relatively 
slow change in the state of the vocal tract, with 
relaxation (high compliance) a slower process than 



constriction (low compliance). Together, 
specification of the compliant state of the vocal 
tract and selection of specific muscular actions is 
one means by which the vocal tttjet states may be 
neurally specified. 

It should be noted, however, that the coding of 
speech motor actions is viewed primarily as a 
static process in which diaracteristic states of the 
vocal tract are identified prior to their actual im* 
plementation. Considering some dynamic proper- 
ties of the speech motor control system provide 
some insist into the manner in which different 
sounds may acquire further acoustic and kine- 
matic distinction* For example, lip closing move- 
ment associated with the voiceless bilabial stop 
is generally but not consistently associated 
with a higher velocity than the voiced bilabial 
or "iD* (Chen, 1970; Graoco, submitted; Siumners, 
1987; Sussman, MacNeilage, & Hanson, 1973). 
lip and jaw closing movements are initiated ear- 
lier relative to vowel onset for voiceless ^* than 

^ voiced V or (Gracco, submitted) resulting 
m shorter vowel durations. One possible explana- 
tion is that voiceless sounds are produced at a 
higher rate or frequency than their voiced coun- 
terparts reflecting a different underlying fre- 
quent specification. Movement frequent is one 
dimension along which different speech sounds 
can be generally categorized. This hypothetical 
frequenqy^ modulation can be integrated with an- 
other dynamic property of the control system. Not 
only are closing movements generally faster for a 
voiceless than for a voiced consonant, but the pre- 
ceding opening movement has also been observed 
to be faster (Gracco, submitted; Summers, 1987). 
It appears that not only may sounds be coded as a 
function of the frequency of individual vocal tract 
adjustments but that the functional requirements 
for specific sounds may be distributed across 
movement cycles rather than focused on a single 
movement phase. This observation suggests the 
operation of a look-ahead mechanism (Henke, 
1966) similar to or identical with the mechanism 
underlying anticipatory coarticulation which pre- 
dictively a4]usts vocal tract actions. Speech motor 
control is a dynamic neuromotor process in which 
overall vocal tract compliance, the location of pri- 
mary valving or shaping synergies, and frequency- 
modulated motor commands are specified by the 
immediate and future acoustic/aerodynamic re- 
quirements. 

Invariance, Redundancy, and Precision 

Before presenting some of the specific processes 
of the speech motor control system that are used 
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to modulate ovandl vocal tract organization, two 
important and nlatad iaauas should ba addrauad; 
invarianca and praciaion. The aaareh for 
invarianoe haa a long and gwoaralbr unauoceaaful 
hittOTy in inveatigationa of 9Mdi production with 
the obvious conclusion that invariance is not a 
directly observable event (alternatively, the 
appropriate metric has not been id entifiadX From 
the perspective of speech as a motor control 
system, a moie fundamental issue is the precision 
with which any quantity* variable, or vocal tract 
configuration is regulated. The presence of 
substantial acoustic, kinematic, electromyo- 
graphic, and aerodynamic variability suggests 
that the speech motor control process operates at 
less than precision (or within rather 

bi^oad tolerance limits). The achievement of 
characteristic vocal tract configurations or 
individual articulatory actions is accomplished by 
a synthesis general activation of most vocal 
tract structures (setting of overall vocal tract 
compliance) and focused activation of the rdevant 
muscular synergies. This is consistent with 
neurophysiological evidence demonstrated in the 
studies of Kots (Kots, 1975) in which voluntary 
movement is seen as a synthesis of diifute 
excitc^tion (pretuning), a more fixed and discrate 
increase in motoneuron exdtabilily (tuninf ) and 
the final "ixiggering^ process. Similariy, brain po- 
tentials prior to the onset of muscle activity dis- 
play rather diffuse activation over multiple corti- 
cal areas for discrete, finger and toe movements 
(Boschert, Hink, & Deecke, 1983; Deecke, Scheid, 
& Komhuber, 1969) and involve larger regions for 
production of speech. (Cuny, Peters, & Weinberg, 
1978; Larsen, Skinh#j, & Lassen, 1978). One 
plausible perspective is that the nervous system 
modulates the focus of primary activation but that 
this process is not punctate. That is, activation 
and deactivation of cortical and perhaps subcorti- 
cal cells involve diffuse and slow changes in acti- 
vation or deactivation which result in distributed 
tonic and phasic muscle activity. Specification of 
vocal tract configurations for specific sounds may 
involve characteristic patterns of activation and 
inhibition in all vocal tract muscles with only 
slightly greater focus on critical articulators in- 
volved in the more dominant or aound-critical 
movements. In some cases muscles may be 
partially activated just because of the proximity of 
their motoneurons to other activated motoneu- 
rons. One conclusion is that the neural processes 
underlying speech motor control are broadly speci- 
fied and that the fimctional speech production 
goals (and the requiaita perceptual propertiea) are 



only categorically invariant As suggested by the 
apparent quantal nature of speech (Stevens, 
1972), as long as the articulatory patterns are 
within a certain range (have not made a categOTy 
diange), the corresponding phonetic properties 
wfll be perceived* with kinematic variations pro- 
ducing very little perceptual effSsct. Perhaps 
speech perception and production ahould be up- 
propriately rapreaented as stochaatic processes 
based on probabili^ statements implemented 
tiiroui^ an adequate but in4)recise control system. 
Strict determinism, invarianoe, and precision are 
most likely relegated to man-made machines 
woiking under rigid tolerance limits or simplified 
■pacifications, not to complex biological systems. 

Sensorimotor Control Processes 
Similar to the temporal organization for speech, 
spatial interactions are evident that reflect 
multiarticulate manipulations to achieve 
characteristic vocal tract states. The clearest 
examples of cooperative and fanctionally-relevant 
spatial interactions are observed when one 
articulator, such as the lip or jaw, is disturbed 
during speaking. Following the application of a 
dynamic perturbation impeding the articulatory 
movement, a compensatory adjustment is 
observed in the articidator being perturbed as well 
as other fimctionally-related, spatially-distant 
articulators (Abbs k Gracco, 1984; Folkins & 
Abbs, 1976; Gracco ft Abbs, 1988; Kelso et al., 
1984; Shaiman, 1989) reflecting the presence of 
afferent dependent mechanisms in the control of 
speech movements. The distributed compensatory 
response to external perturbations is a direct 
reflection of the overall functional organization of 
the speech motor control process and is 
comparable to other sensorimotor actions observed 
for other motor behaviors such as postural 
adjustments (Marsden, Morton, & Morton, 1981; 
Nashner & Cordo, 1981; Nashner, WooUacott, & 
Tuma, 1979), eye-head interactions (Bizxi, Kalil, & 
Taglaisco, 1971; Morasso, Bizzi, & Dichgans, 
1973), wrist^thumb actions (Traub, Rothwell, & 
Marsden, 1980), and thumb-finger coordination 
(Cole, Gracco, & Abbs, 1984). Qumging the size of 
the oral cavity with the placement of a block 
between the teeth similarly results in 
compensatory changes in articulatory actions 
resulting in perceptually-acceptable vowel sounds 
(Lindblom, Lubker, Gay, 1979; Fowler & 
Turvey, 1980). It appears that the speech motor 
control system is designed to achieve functional 
behaviors through interaction of ascending 
sensory signals with descending motor commands. 
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Human and nonhuman studies have shown that 
sansory ncaptors locatad throughout the vocal 
tract are su£5dent to provide a range of dynamic 
and static infonnatioii which can be used to signal 
position, speed, and location of physiological struc- 
tures on a movement to movement basis (cf. 
Hunger & Halata, 1983; Dubaer, Sessle, & Storey, 
1978; Kubota, Nakamura, & Sdiumacher, 1980; 
Landgren k Olsson, 1982 for reviews). Studies 
v ^Hging perturbation of speedi motor output indi- 
cate that the rich supply of orofacial somatic sen- 
soiy afferents have ^e requisite properties to in- 
teract with central motor operations to yield the 
flexible speedi motor patterns associated with oral 
communication (Abbe & Gracco, 1984; Gracco & 
Abbs, 1985; Gracco & Abbs, 1988; Kelso et al., 
1984). Because of the constantly changing periph- 
eral conditions during speaking, the absolute posi- 
tion of vocal tract structures can vary widely de- 
pending on the surrounding phonetic environ- 
ment The speech motor control system apparently 
adjusts for these movement to movement varia- 
tions by incorporating somatic sensory informa- 
tion from the various muscle and medianorecep- 
tors located throughout the vocal tract. 
Considerations outlined elsewhere (Gracco, 1987; 
Gracco Abbe, 1987) suggest that the speech mo- 
tor control system appears to use somatic sensory 
information in two distini^ ways; in a comparative 
manner to feed back information on the attain- 
ment of a speech goal and to predictively parame- 
terize or adjust upcoming control actions. 
Structurally, there is strong evidence for the in- 
teraction of sensoiy information from receptors lo- 
cated within the vocal tract with speech motor 
output at many if not all levels of the neurazis (cf. 
Gracco, 1987; Gracco & Abbs, 1987 for a summary 
of the voca^. tract representation in multiple corti- 
cal and subcortical sensory and motor regions). 
Further, brain stem organization, evidenced by 
reflex studies, demonstrate a range of complex in- 
teractions in which sensory input from one struc- 
ture such as the jaw or face is potentially able to 
modify motor output from lip and tongue as well 
as jaw muscles (Bratzlavsky, 1976; Dubner et al., 
1978; Smith, Moore, Weber, McFarland, & Moon, 
1985; Weber & Smith, 1987). It appears that there 
are multiple synaptic interactions possible 
throughout the neural sjrstem controlling the vo- 
cal tract, with the specific interaction dependent 
on how the system is actively configured. 

Speech motor actions involve the activation or 
inactivation of various muscles of the vocal tract 
which are adjusted based on the peripheral condi- 
tions and the specific phonetic requirements. An 



important question related to the neural represen- 
tation for speech is the character of the underlying 
activation process for different articulatory 
actions. A number of recent studies, evaluating 
the kinematic characteristics of different 
articulators, are consistent with a single 
sensorimotor process to generate a variety;- of 
articulatory actions. One method for eya)r^ting 
the similarity in the underlying r^resentation for 
multiple speech sounds and their usodated 
movement dynamics is to compare the geometric 
(normalized) form of velocity profiles. A change in 
velocity profile shape accompanying experimental 
manipulation of phonetic context suggests a 
change in the movement dynamics, and by 
inference a change in the underlying neural 
representation. Conversely, a demonstration of 
trajectory invaiiance or scalar equivalence for a 
variety of movements suggests that difTerrat 
movements can be produced from the same 
underlying dynamics (Atkeson & HoUerbach, 
1985; Holleibach & Flash, 1982). That is, in order 
to produce movement variations appropriate to 
peripheral conditions and task requirements, it 
may be necessary only to scale the parameters of a 
single underlying dynamical relation; a much 
simpler task and, by inference, a simpler neural 
process. For movements of the vocal folds, tongue, 
lips, and jaw during speech it has been shown that 
changes in movement duration and to a lesser ex- 
tent movement amplitude reflect a scaling of a 
base velocity profile (Gracco, submitted; MunLxall, 
Ostry, & Parush, 1985; Ostry & Cooke, 19S7; 
Ostiy, Cooke, & Munhall, 1987; Ostiy & Munhall, 
1985). A scalar relation across a class of speech 
sounds involving the same articulators main- 
tained for different initial conditions (different 
vowel contexts) suggests that the neural represen- 
tation has been maximized and such a representa- 
tion might reflect a basic component of speech 
production. That is, all speech movements may 
involve a simple scaling of a single characteristic 
dynamic (force-time) relationship (Kelso & Tuller, 
1984) with the kinematic variations reflecting the 
influence of biomechanical and timing specifi- 
cations. In addition, specification of control signals 
in terms of dynamics eliminates the need to spec- 
ify individual movement trajectories since the 
path taken by any articulator is a consequence of 
the dynamics rather than being explicitly specified 
(see Kelso et al., 1984; Saltzman, 1986; Saltzman 
& Munhall, 1989). The scaling of individual 
actions appears to be another characteristic 
prcf^ss that eliminates the need to store all 
possible phonetic variations explicitiy. Rather, the 
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control process ii a lealing of dbaracteristic motor 
pattomt a4)ttstod for andogenoos conditions 
(spaakins rata, ami^acls, upoominc functional 
requiramants) and the snrrouikding phonetic 
environment (sensorimotor adjustments). The 
classic oentral-peripharal, motor program-reflex 
per^)actiYes have given way to more reasonable 
and realistic issues including when and how 
sensory information may be used and how the 
different representations are coded for the 
generation of all possible speech movements. 

Movement seqaendng 

A significant characteristic of many motor be* 
haviors sudi as speech, locomotion, dbewing, and 
typing is the production of sequential movements. 
Observations that iaterarticulator timing is not 



disrupted following perturbation (Gracoo & Abbs, 
1988), that speech rate can be modulated by 
dianges in sensory input (Gracoo & Abbs, 1989), 

fH9^^ tk^f prtlTTbatK^ iwHtiflM Aany^ in 

spaedi moftmmt duratiim (Gracoo & Abbs, 1988; 
Lindbkm at aL, 1987) are connstent wi& an un- 
derlying oscillatory mechanism for speech. 
Further, somatic sensoiy-induoed changes in the 
timing of oral closing action (due to lower lip per- 
turbation) is consistent with an underlying 
oscillatory process (Gracoo lb Abbs, 1988; 1989). 
Qualitative obsarvations of temporal con sis tency 
of g ^timH^l movements are also consistent with 
an underlying oscillatory or rhythm generating 
mechanism. Presented in Figure 4 are 24 super- 
imposed movements of the upper lip, lower lip, 
and jaw for the sentence •Buy Bobby a Poppy.* 
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B uy B 0 bb y a P o pp y 
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These rapetitiont were produced as part of a 
larger stuudiy and were produced at different times 
during the experiment The subject produced one 
repetition per breath and each repetition was 
produced at a comfortable subjectrdefined rate. As 
can be seen there is a consistency to the 
repetitions that suggests an underlying periodicity 
indicative of a rhythmic process. A few studies, atr 
tempting to address the periodicity and apparent 
rhythmidty of speedb have demonstrated the 
presence of some form of underlying frequonpy 
generating mechanism. Ohala (1976) record^ 
over 10,000 jaw movements within a L5 hour 
period of oral reading and was able to identify 
frequencies ranging from 2-6 Hz with significant 
durational variability. Kelso et al. (1985) using 
reiterant productions of the syllable 'liia* or '^ma* 
dunonstrated a rather strong periodicity at ap- 
proximately 5-6 Hz with minimal duraticmal vari- 
ability. The findings of the Kelso et al., (1985) are 
consistent with an underlying osdllatoiy process. 
In contrast, the range of frequencies found by 
Ohala (1976) may reflect the frequency 
modulation associated with the sounds of the 
language, a factor ™*iiiniMil in the Kelso et al. 
(1986) study. The modulation of frequency, 
dependent on specific aerodynamic properties of 
the spedfic sounds and Aurrounding articulatory 
environment may be a mechanism underlying the 
speech movement sequencing (see also Saltzman 
& Munhall, 1989 for further discussion of serial 
dynamics). That fact that the frequent values 
reported by Kelso et al. (1985) were similar for 
*ba* and ^inBT suggest that vowels may be am^jor 
factor in determining the local periodicity. 
However, it is the case that the individual 
movements or movement ^cles are not the same; 
local frequencies are different depending on the 
phonetic context. 

In addition, speech production involves many of 
the same muscles as such automatic behaviors as 
breathing, chewing, sucking, and swallowing. It 
has been suggested that the mechanisms underly- 
ing speedi may incorporate, to some degree, the 
same mechanisms as more automatic motor 
behaviors but ad^ited for the specialized function 
of communication (Evarts, 1962 Gracco Abbs, 
1988; Grillner, 1982; Kelso, Tuller, & Harris, 
1983; Lund, i^penteng, Seguin 1982). Few 
studies have focused specifically on the similarity 
of speech with more innate, ^aythmic motor 
behaviors (Moore, Smith, & Ringel, 1988; Ostiy 
Flanagan, 1989) with mixed interpretations. 
Recent experiments and theor^cal perspectives 
on the organization of central pattern generators 



for rhythmic behaviors such as locomotion, 
respiration and mastication suggest a more 
flexible conceptualization of the possible 
behavioral outputs than has previously been 
envisioned for the neural control of rhythmic 
behaviors (see (}ohen, Bossignol, & Grillner, 1988; 
(Setting, 1989 for reviews). For example, in vitro 
results suggest that the central pattern generator 
for respiration may more appropriately be 
considered as two separate but interrelated 
functions; one generating the rhythm and one 
generating the motor pattern (Feldman, Smith, 
McCTrimmon, Ellenbeiger, & Speck, 1988). The 
in^lication for other rhythmic and quasi-rhythmic 
behaviors sudi as speech, is that each function 
can be modulated ind^)endently thus generalizing 
the concept of a central pattern generator to a 
wider range of behaviors. Kecentty, Patla (1988) 
has suggested that nonlinear conservative 
oscillators are the most plausible class of 
biological oscillators to model central pattern 
generators in that ^hey provide the necessary 
time-keeping funci i as well as independent 
shaping of the output (see also Kelso & Tuller, 
1984). The recent demonstration by Moore et aL 
(1988) that mandibular muscle actions for speech 
are fundamentally different than for chewing 
suggests that the patterning for each behavior is 
different That is, speech and chewing may share 
the same generator but have different patterning 
or, conversely, rely on different generators and * 
patterns. Conceptually and theoretically, a 
fundamental frequency oscillator and static 
nonlinear aKyping function can generate a number 
of complex patterns. While speculative, some 
current CPG models have the necessary 
complexity to be tentatively applied and 
rigorously tested as to their appropriateness for 
speech motor control. 

SUMMARY 

From the present perspective, the speech motor 
control system is viewed as a biophysical structure 
with unique configurational characteristics. The 
structure does not constrain the systems' 
operation but significantly affects the observable 
behavior and hence the resulting acoustic 
manifestations. Consideration of the structural 
organizati<m and the potential contributions from 
biomechanical interactions are suggested as 
potential explanations for some speech motor 
variability. Sensorimotor mechanisms were 
implicated as the means by which adjustments in 
characteristic vocal tract shapes can be 
dynamically and predictively modified to 
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accommocUta the fh^wgiwg peripberal oonditioni. 
From tha perspactiva of tha vocal tract as ihe 
controUad ayatam, tha coniiitant coordinativa 
timing ralationshipa raflact tha functional 
modification of all tha control . alamantt or 
articulatory itructuraa. Raihar than daacribing 
■ound productaon aa tha modulation or asaambly 
of diicrata units of action, tha currant fimctiotial 
parq)activa auggaata that antira vocal tract actions 
are modulated to regulate aoousttc/aerodjnamic 
output parameters* The different parameters are 
realized by man^ulation of the firequanqr of the 
forcing function applied uniformly to the control 
elementa of the system* Rather than a parametric 
forcing in whidi some parameter such as stiffiiess 
is viewed as a regulated variable, it is 
hypothesised that the system is eztrinsically 
forced by manipulation of the firequenqr of nmural 
oul^ut consistent with the spatial requirements 
(04^. movement eitent) of the task. The frequency*^ 
modulated neuromotor actions are tiien filtered 
through a complex peripheral biomedianical 
environment resulting in elaborate kinematic 
patterns. Speech motor control is viewed as a 
hierarchically organised control structure in 
which peripheral somatic sensory information 
interacts witii central motov repraaentaticms. The 
control scheme is viewed as hierarchical from the 
standpoint that the motor adjustments are 
embedded within a number of levels of orga- 
nization reflecting the overall goal of the motor 
acty communication. Modifications in the control 
signals reflect the parallel processing of multiple 
brain regions to scale and sequence changes in 
overall vocal tract states (Graoco & Abba, 1987). 
The organizational characteristics of speedi as a 
motor control ^tem are fundamentally similar to 
other sequential motor actions and are felt to 
involve a limited number of general sensorimotor 
control pro cess e s . 
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Sensorimotor Mechanisms in Speech Motor Control"^ 



Vincent L-Gracco 



A conceptual tnodtl of speech motor control is developed in which the elemental units for 
speech are sound-producing coordinated movements of the vocai tract* The perspective 
taken is that the degrees of control freedom are at a system level; in the operation of the 
processts that implement the speech motor action. Speech motor control is conceptualized 
as a multistage parallel procees in which vocal tract specifications are activated by central 
motor commands which interact with a central Rythmic output to produce serial 
coordinated movements required for sound generation. Vocal tract specifications include 
the selection of characteristic neuromotor patterns, which map isomorphically onto the 
phonemes of the language. Coordination of the contributing movements and on-line spatial 
ac^ustments within and among vocal tract structures are inherent in the neuromotor 
patterning and activation processes, respectively. The elemental units are retrievable 
patterns stored in the central nervous qrstem and instantiated hy the directed action of 
the posterior parietal cortex. Two migor brain systems (basal ganglia-cupplementary 
motor area and the cerebellar-premotor area), are proposed to play major roles in 
implemsnting neuromotor specifications by modulating tiie characteristic patterns and the 
sequencing their actions into larger meaningful units of production. It is the action and 
interaction of these sensorimotor mechanisms that result in the speech motor patterns 
characteristic of human verbal communication. 



INTRODUCnON 

If ycxi root yourself in the ground, you can ^ord to 
be stupid. But if you move» you must have 
mechanisms for moving, and mechanisms to ensure 
that the movement is not utterly arbitrary and 
indq)endent what is going on outside. 

Patricia Smith Chuichland (1986). 

After years of theoretical debate and endless 
empirical investigations, the classic central- 
peripheral issue that has guided much of the 
research in motor theory has given way to .the 
more reasonable perspective that movement 
reflects an interaction of peripheral influences and 
central motor processes; behavior is sensorimotor 
in nature. Moreover, it is becoming increasingly 
clear that any behavior is a reflection of multiple 
overlapping and interacting influences, each of 
which needs to be identified. The purpose of iden- 
tifying the subcomponents is not strictly 
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to assign function to structure but to evaluate 
their potential contribution to the overall process, 
and hence allow development of realistic and 
biologically plausible working models of the 
system. An important research focus in human 
motor behavior has become the development of 
models that capture the essence of sensorimotor 
control (P. M. Churchland, 1989; Marr, 1982; 
McClelland & Rumelhart, 1986; Pellionisz & 
Llinas, 1979; Pellionisz & LUnas, 1985; Rumelhart 
& McClelland, 1986). The rationale for such an 
endeavor is two fold: first, there is an inherent 
richness and intricacity to even the simplest 
problem of sensorimotor control, and second, an 
implicit assumption that higher functions such as 
cognition are not discontinuous vrith the lower 
level sensorimotor functions that implement them 
(see P. S. Churchland, 1986). In this regard a 
statement by Htighlings Jackson made over 115 
years ago seems prophetic: 

I cannot conceive what even the highest nervous 
centres can possibly be, except developments out of 
lower nervous centres, which no one doubts to 
represent impressions and movements. 

—J. HughUngs Jadcson (1875). 
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Because of ito weU-leamed and eoologiGally sig- 
nificant xiature» ^eech is an ideal biliavior for ^ 
investigatioa of seosorimotor oontxol m e ch a ni s m s. 
Moreover, as a reflectioo of one of man's most 
highly developed behaviors, a thoroui^ under- 
standing of the processes of communication may 
provide valuable indght into the operation and 
functional organisation of the human nervous 
system* 

The purpose of the following chapter is to pro- 
pose a preliminary conceptusl model of speech 
production from a functional (e^., c ommu n i ca t ive) 
perspective that is grounded as much as possible 
in physiological mechanisms and plausible ner- 
vous system processes. Implicit in any model of 
human behavior is the tadt assumption that the 
hypothetical processes or {unctions actually exist 
in some form in tiie central nervous system or at 
least emerge from central, peripheral and/or 
biomedianical interactions. As such, the conevp- 
tual model wi31 be limited to constructs known or 
suspected from nervous system mechanisms. How 
many different mechanisms are required to ex- 
plain the observable behavior? What aspects of 
the observable behavior need to be explained or 
accounted for? What role does peripheral s«isory 
information play in the control of s p eedi move- 
ments? What are the ofganixational principles for 
speedi production? Ibese are some of the issues 
that will be dealt with in the following chapter. 
Because the model presented is conc^toal in na- 
ture and preiiminaxy in form, only basic principles 
will be presented and many d^ails will be laddng. 
One important component that will not be dis- 
cussed the contribution of the biomedianical pe- 
riphery to the shaping of the complex kinematic 
patterns characteristic of speech. Only throu^ in- 
coTporati<m of the physical properties of the vocal 
tract with underlying sensorimotor medianisms 
can a realistic and parsimonious model be con- 
structed. Within this limitation, a focus on under- 
lying global sftfiiorimotnr processes should provide 
an additional and potentially viable perspective on 
speech production and perhaps a better perq>ec- 
tive on motor speedx disorders as well. 

Organizational structure for speech 
motor control 
In order to discuss the sensorimotor mecha- 
nisms that may underlie speech production it is 
first necessary to determine the most plausible 
conceptualization of the system being controlled. 
During speech, different vocal tract actions are 
t^^i^y^ciKi to produce groups of linguistically-rele- 
vant sounds. Over the last 8-10 years, attempts 



have been made to determine the specific organi- 
zation for QModi motor ccmtrol, ie., to identify the 
apprtqpriate level of articulatory organization. The 
lack of invariant individual articulatory actions 
and the relatively consistent ensemble articula- 
tory actions suggests that the nervous system does 
not explidtly control the action of a single muscle 
or articulator (Graeco & Abbe, 1986; Kelso ft 
Tuller, 1984; Saltzman, 1986). Bather, speech mo- 
tor actions are organised at a level that reflects 
the interaction of a number of muscles and/or ar- 
ticulators engaged in the same functional task. 
For example, the final positions of the upper lip, 
lower lip, and jaw during bilaUal production are 
not invariantly fitt*^"^ but vary systematically 
within some such that an apparent goal, oral 
closure, is achieved (Gracco ft Abbs, 1986). 
Similarly, when tiie movement of an articulator is 
unexpectedly isq>eded during its normal motion, 
displacement is increased in the perturbed articu- 
lator as well as in various unperturbed articula- 
tors actively involved in producing the movement 
goal (Abbs ft Gracco, 1984; Gracco ft Abbs, 1986; 
1988; Kelso, Tuller, V.-Bateson, ft Fowler, 1984; 
Shaiman, 1989). Relative timing patterns ob- 
served for the upper lip, lower lip, jaw, and lower 
lip, jaw, and larynx in various phonetic contexts 
suggests that coordinative adjustments across vo- 
cal tract components is an important property of 
the motor control process (Gracco ft Ldfqvist, 
1989; Gracco, 1988; Gracco ft Abbs, 1986; ldfqvist 
ft Yoshioka, 1981;1984). Consistent relative tim- 
ing relations, distributed compensatory actions, 
and systematicany variable articulatory interac- 
tions suggest that speedi motor control must be 
viewed from a perspective encompassing ensemble 
articulatory actions. An important research ques- 
tion is the size of the ensemble, i.e., the size of the 
production unit. 

One possible approach to the question of articu- 
latory organization is captured in the construct of 
a coordinative structure (Fowler, Rubin, Remez, ft 
Turvey, 1986; Kelso, 1986; Kugler et al., 1980; 
Saltzman ft Kelso, 1987; Turvey, 1977). For 
speech, such a st^le of organization involves a 
number of flexible, but relatively constrained ar- 
ticulatory actions or ensembles, represented 
conceptually as tract variables (see Saltzman, 
1986; Saltzman ft Munhall, 1989) or 
physiologically as functional synergies (Fowler et 
al., 1980; Kelso, 1986; Kelso ft Tuller, 1984) 
assembled into larger action units to produce 
sound (Browman ft Goldstein, 1989; 1990; 
Saltzman, 1986; Saltzman ft Munhall, 1989). 
From this perspective, speech sounds result from 
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the assembly of vocal tract actions (constriction 
producing events) from presumably independent 
primitive gestural \mits (Browman & Groldstein, 
1989; Fowler et al., 1980; Kelso, 1986). This 
particular organizational sdieme can be thought 
of as horizontal in the sense that the vocal tract is 
partitioned into articulatory subsystems which 
are marshalled into task-specific patterns (Kelso, 
1986). However, one assumption of the 
coordinative structure, i.e., an active process that 
pieces together or assembles elementary or 
primitive articulatory actions has never been 
critically evaluated. 

The construction of complex behaviors from 
simpler movements has been suggested for other 
tasks such as locomotion (Flashner, Beuter, & 
Arabyan, 1988), handwriting (HoUerbach, 1981: 
Morasso & Mussa-Ivaldi, 1982; Edelman & Flash, 
1987; Lacquaniti, 1989) and pointing movements 
(Atkeson & HoUerbach, 1985: Morasso, 1981). A 
major difference, however, is that the behavior is 
organized vertically in the sense that complex be- 
havioral sequences are composed of a smaller 
segments involving the entire effector unit rather 
than anatomical parts. For example, a primitive 
stroke in handwriting would involve all necessary 
components of the shoulder, arm and hand to pro- 
duce a curved line (an elemental stroke) rather 
than isolated actions of the parts. Using the same 
analogy, speech production may be described as 
the concatenation of fundamental actions such as 
opening the vocal tract (as in the production of 
vowels) and closing the tract (as in the production 
of consonants) which produce or modulate sound. 
Rather than viewing the production of a /p/, for 
example, as involving a niunber of independent 
gestures (lip aperture gesture, a glottal gesture, 
an oral and pharyngeal gesture, and a velar ges- 
ture) assembled through a coordinative process, a 
simpler perspective is to view speech production 
in a wholistic sense in which characteristic neu- 
romotor patterns, involving all components of the 
vocal tract, is the elemental control structure for 
speech. It can be argued that observations of dis- 
tributed compensatory actions involving local and 
remote articulatory adjustments (Abbs & Gracco, 
1984; Folkins & Abbs, 1975; Folkins & 
Zimmermann, 1982; Gracco & Abbe, 1988; Kelso, 
et al., 1984; Shaiman, 1989) are consistent with a 
level of organization in which vocal tract 
configurations are manipulated mth no need for 
additional processes to assemble fimdamental, 
nonspeech producing units. Similarly, recent 
findings such as the apparent adjustment in 
laryngeal timing to lower lip perturbation 



(Munhall, LSfgvist, & Kelso, in press) and the con- 
sistent relative timing among lip constric- 
tion/occlusion movements and glottal devoicing 
(see Figure 1 from Gracco & L6fqvist, 1989) sug- 
gest that neuromuscular adjustments across vocal 
tract structures are accomplished through ma- 
nipulation of a common driving signal (Gracco, 
1988) applied in a systematic manner to all active 
components of the vocal tract involved in produc- 
ing a particular sound. It is apparent, however, 
that the available empirical evidence is consistent 
with either perspective and that conceptually 
identification of ^e* primitive units of speech 
motor control is not important. Only in attempting 
to develop a realistic and parsimonious neurobio- 
logical and biophysical model of speedi motor con- 
trol does this issue has direct ti^retical relevance. 

CHARACTERISTIC MOTOR PATTERNS 

As suggested above, coordinated sound-produc- 
ing vocal tract actions, consistent with a segmen- 
tal organization, are viewed as the smallest func- 
tioning structural units in the sensorimotor 
control process for speech. These hypothesized 
units are not abstractions, but characteristic 
neuromotor patterns whose implementation result 
in the production of sound. The characteristic 
patterns are similar to ideas presented by others 
such as Joos (1948), Fowler (1983), Saltzman and 
Munhall (1989) and Ldfqvist (1990) but differ 
mainly in their level of description. At a 
neurophysiological level, these characteristic 
patterns are not invariant but are hypothesized to 
reflect a reference neural substrate which other 
sensorimotor processes act on resulting in ou^ut 
variability. This conceptualization is different 
from earlier speech production models which 
postulated the presence of invariant motor 
commands in that the patterns are one part of a 
distributed process, not the output of the system. 
The suggestion that speech production involves 
characteristic (not invariant) patterns is both 
logical and observable. For example, bilabial 
production always involves, to some degree, the 
same muscles produced with related characteristic 
actions. For example, presented in Figure 2a and 
2b is a representative neuromuscular pattern for 
the upper and lower lip muscles and the resulting 
movement for the nonsense word ^sapapple." 
Within certain boundary conditions, oral opening 
for an open vowel for /ae/ will result in some 
activity in upper lip and lower lip elevator and 
depressor muscles, respectively indicated in the 
figure (Figure 2a) levator labii superior (LLS) 
and depressor labii inferior (DLI) (Figure 2b). 
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LLS 
DAO 





Figure 2a, Rcctifitd miatdc activity for an upp«r lip elevator Oevator labii superior/ LLS)/ two upper lip depres«ois 
(depressor anguL' :^0/ DAO, and oibicularis oris superior/ OOSh upper lip displacement (ULx) and acceleration 
(bottom trace} inufc^i.iing a poition of what can be considertd a neuromuscular pattern for oral dosing. For tibe upper 
lip, the large negative-going acceleration marics the onset of the segment/ followed by f^iasic burrts of muscle activity 
in DAO and OOS accompanying the onl closing. 



001 



MTL 




Figure 2b, Rectified muscle activity for an lower lip depressor (depressor labii inferior, DLl), two lower lip eievatois 
(oxbiculaiis oris inferior, OOI, snd, mentalis, MTL), lower lip displacement (LLx) and acceleration (bottom brace). For 
the lower lip, tfie laige positive^oing acceleratim marics the onset of the segment/ followed by phasic buimis of 
muscle activity in OOI and MTL accompanying the oral closing. This pattern of activation, along with the one 
presented in 22^ art consideied diandefistic of all bilabial sounds* 
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Oral closine for any bilabial will involve tome 
degrae of activity in uppar lip dapraasor muiclat 
such as depressor anguli oris (DAO) and 
orbicularis oris superior (OOSKFigure 2a), and 
activity in lowar Bp elevators (Figure 2b) such as 
mentalis (BfTL) and oifaicularis oris inferior (OOD; 
some oocontraction in LLS and DLI will ac- 
cosipany the dosing action prssusiably to increase 
the overall stiffiieca of the lipa and/or peifa^ to 
damp the movements. This description reflects a 
consistent pattern of muscle action that aocompa- 
nies an bilabial sounds. In contrast, bilabials are 
not produced with the tongue and, all things 
equal, are laiually produced at a faster rate than 
vowel sounds. While there are certainly differ- 
ences in some of the other contributing muscles in 
the vocal tract depending on whether the sound is 
/p/, /bf, or /m/» these are based on the particular 
aerodynamic or asoustic requirements for the 
sound. Similarly, the relative timing of such ac- 
tions are also systematically related indicating 
that while the timing patterns may differ, they 
are related in a predictable manner observing 
simple scaling laws (Gracoo, submitted). What 
uniquely defines each sound in the language is its 
particular neuromuscular configuration reflecting 
a distinct spatio-temporal pattern of activation 
and resulting motion. These patterns are not de- 
signed to explain all the details of observable 
speech movement actions, but are viewed as one 
fundamental conq>onent in the motor control pro- 
cess. Each component or group of components in 
the specification may have different activation 
patterns which reflect the form of ihe signal that 
impinges on lower motor neurons. In part the ac- 
tivation patterns reflect the contribution of the 
spedlic articulator to the sound as well at adiuit- 
ments for the different tnomechanical properties of 
the articulators. The activation patteme for the lip 
and the jaw muscles, for example, reflect their 
contribution to dosing the oral end of the acoustic 
tube; the activation patterns are phasic, producing 
rapid dosing movements; and the timiog of medial 
pterygoid action occurs before the labial muscles 
due to the inertia of the jaw. In contrast, the acti- 
vation patterns for the pharyngeal constrictors are 
more tonic and of longer duration reflecting their 
role in adjusting the tissue impedance of the vocal 
tract walls. In this regard, these patterns are 
viewed functionally as representing the essential 
dynamics of speeds movement production and 
modulated by the differential filtering pnH;>erties 
of the biomedumical periphery. 

Prior to motor output at the periphery, these 
characteristic patterns are proposed to have a two 



or three dimensional spatial representation 
within, at least, the primary motor cortex and 
perhaps other nonprimary motor areas as well. 
Rather than attempt to present a speculate 
schematic spatial representation within the cen- 
tral nervous system, a schematic of a character- 
istic neuromuscular implementation realized at 
the periphery will be presented. Shown in Figure 
3 is a T^resentation of the output signals sent to 
various i?euromuscular components of the vocal 
tract to produce a /p/. Given that many of the de- 
tails are not currentiy known, the figtnre provides 
only the important neuromuscular components of 
the pattern. Further, muscle actions are func- 
tional groi4>ed such that upper lip depressors 
(orbicularis oris superior and dq[>ressor anguli 
oris), for example, are only represented based on 
their articulatory consequences. At this level of 
observation, the diaracteristic motor patterns are 
isomorphic with the gestural constellations in the 
computationally sophisticated linguistic Gestural 
Model (LGM) developed and implemented at 
H fl ff>n«ff Laboratories by Browman, Goldstein, and 
colleagues (Browman & Goldstein, 1985, 1986, 
1989, 1990) and incorporates the aspects of earlier 
and more recent properties of the task dynamic 
model (TD) developed and refined by Saltzman 
and colleagues (Saltzman, 1986; Saltzman & 
Kelso, 1987: Saltzman Munhall, 1989). The ma- 
jor d^erence (besides the fact that the TD and 
LGM are computational and this model has no 
Bucb constraints!) is that mudi of the details that 
coordnute task-related vocal tract actions and dif- 
ferentiate sounds of the language are incorporated 
into stored nervous system elements which effec- 
tively reduce the on-line computational complex- 
ity. The rationale for audi an approach is that 
sperxh as a well-learned (or over learned) motor 
behavior, incorporates much of its operation into 
automatic sensorimotor fimctions. 

Within the current model, the diaracteristic vo- 
cal tract configurations and the phonemes of the 
language are isomorphic. This requires 43 differ- 
ent vocal tract spedfications each with its charac- 
teristic neuromuscular spedfications retained in 
nervous system memory; 43 is certainly not a 
uumber that would tax nervous system storage cr 
processing capabilities. However, it is not clear 
that this is the fundamental unit of production or 
that phonemes are an important organizational 
unit; rather, sound producing vocal tract actions 
are the lowest level of sensorimotor ccmtroL As 
Sttdiy induded in Figure 3 are the neuromuscular 
signals preceding oral dosing (assodated with a 
generic vowel) since in most cases opening and 
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closing actions must be tightly coupled. From a of speech motor control. As suggested above, in> 
sensorimotor perspective, a VC or CVC (opening- herent in each pattern is the temporal 
closing) organization is more appealing as a unit coordination among the constituent components. 
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FiguTt 3. A schtnidtic peripheral rcpmentation of a diaiacteristic pattern of vocal tract activation for the bilabial /p/. 
The doited lines generally demarcate the segment boundaries. Abbieviations are as follows: VE*velar elevator, VD- 
velar depressor, ULE-upper lip elevator, ULD-upper lip depressors, LLE-lower Up elevators, LLD^lower lip depressor, 
}OP-)aw openert, JCL-jaw closets, EXT-extrinsk (tongue muscles), INT-intrinsic, C-constrictori, GOP-glotUl opener, 
GCL-giotUl closets. 
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The time course of activation of the particular 
components and the particular signal shapes re- 
sult in po ntittfr^ t and ssrstematic coordinative pat- 
terns assodatad with various sounds. It is not 
surprisins that relative timing is so consistent 
even in the face of mechanical perturbations 
(Gracco & Abbs, 198c; Oraooo & Ldfqvist, 1989; 
Gracco, 1988). These diaracfeeristic patterns can 
then be modulated aeomling to other task related 
factors sttdi as the distance to be moved, the over- 
all rate of movement, and the presence of various 
stress adjustments. The patterns, with their in- 
herent relative timing relations, can be easily 
compressed or expanded in a systematic numner 
by modulation of the firequenor and/or amplitude 
of the input signals . It is also likely that tiie signal 
shi^s vaxy for different articulators, since eadi 
articulator has specific biomechanical properties 
and such differences have generally been taken 
into account ak. least during devek^nment Finally, 
separate processes extrinsic to the pattern such as 
those for speech rate and stress specifications 
should result in a unitary adiustment in all vocal 
tract structures. The observation of simultaneous 
respiratory, laxyngeal, and oral adjustments ac- 
companying emphatic stress-related manipula- 
tions is coo sistent with this organizational sdieme 
(Fowler, Gracco, ft V.-Bateson, 1989). 

Before proceeding, a number of pcnnts should be 
discussed. First, while vocal tract specifications 
involve description of individual musdes and sub- 
miisde actions, it is not being suggested that the 
child learning to speak has to obtain control over 
all the individual muscular degrees of freedom. 
More likely, certain synergies exist, even at birth, 
that reflect constraints on the sound producing 
mechanism. As early as the birth ciy, the infant is 
producing coordinated actions of the respiratory, 
laryngeal and supralaryngeal systems, or a cry 
would not be possible. As such, patterns are pre- 
sent that can be used as the basis for further dif- 
ferentiation. It is certainly plausible that these 
fundamental patterns are learned by the child 
during development based on some fundamttital 
nonspeech actions amerging from breathing, suck- 
ing, chewicg, swallowing, crying and early vocal- 
izations. For example, breathing involves opening 
of the glottis during breathing whidi must be ac*- 
companied by relaxation (or significant reduction) 
in the activity of laryngeal adductors. Similarly, 
crying involves coordinati(m of expiration with la- 
ryngeal adduction to produce vibration. As the 
child matures variations of this pattern may form 
the basis for voicing and devoidng. During diew- 
ing a basic pattern of jaw opening, accompanied 



by relaxation of jaw dosing, forms a pattern that 
can be modified to produce the more variable jaw 
patterns for speedi. Speeds motor development 
may be envisioned as a learning process in which 
the dnld makes finer and more varied adjust- 
ments in its vocal tract, generalizing from funda- 
mental nonspeech actions, to produced sounds. It 
is sngnsted that sudi actions become fixed once a 
sound is acquired by the diild, and the character- 
istic neuromuscular pattern becomes aretrievable 
element in the diUd's sensorimotor repertoire. 

There are a number of reasons for 
conceptualising vocal tract actions from a 
neuromuscular perspective. First, the ainlity to 
fractionate control of muscles into functional 
chunks is consistent with the level of control 
exercised by the nervous system (English, 1982; 
Loeb, 1985). This is not to suggest that the 
nervous system controls musdes as opposed to 
movements; rather the detailed somatotopy and 
apparent firactionated control at the level of the 
motor cortex and brainstem can be exploited 
during speech acquisition to provide the 
framework to assemble patterns involving 
synergistic and part muscle actions. Second, 
description of the physiological diaracteristics of 
speedi movements has the potential to provide a 
levd of observation and detail not possible with 
more traditional kinematic accounts. This 
perspective captures the essence of the neural 
yjgMla whidi co-occur with the contractile forces 
creating movement. With the concomitant 
development of realistic biomechanical models or 
elaboration of the biomechanical properties of the 
vocal tract, sudi signals can be used heuristically 
to determine whidi aspects of speedi movement 
need to be explained in a control sense and which 
details emerge from passive biomechanical 
properties of the articulators. Finally, explicit 
consideratim of the neuromuscular activation of 
vocsl tract components provides insight into the 
manner in which these characteristic patterns 
become modi^vM^ during implementation. 

M ODinCAnON OF VOCAL TRACT 

conhgurahons 

To implement any action spedfic muscles in- 
volved can have <mly one of three distinct states of 
specification; activated, inhibited, or null. 
Unspedfied articulators (null states) allow con- 
tiguous segmsotal vocal tract actions to intrude 
resulting in coartieulation (see Fowler, 1980; Kant 
Ifinifie, 1977; Ohman, 1966; Saltzman & 
Munhall, 1989). Similarly, vocal tract actions in- 
volving the same articulator can be blended with 
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the rate of segmeatal a^justmects determining 
the observable manifestation (see Munhall & 
Lfifqvist, 1992; Stetson, 1951). Vocal tract actions 
may have contiguous idionetic segments with dif- 
fering degrees of antagonistic action associated 
with a particular articulator. In certain contexts, 
neighboring submuscle actions of a particular ar- 
ticulator, sudi as the anterior and posterior por- 
tions of &e tongue, may result in antagonistic ac- 
tion and articulator undershoot. One of the conse- 
quences of explicit consideration of neuromtiscular 
organization is that coarticulation and other re- 
lated phenomenon involving the smearing of char- 
acteristic vocal tract states should be affected by a 
combination of factors including degree of compe- 
tition in contiguous segments and the overall 
speed or freguenQr of production. Further, if the 
sensorimotor control scheme outlined in the previ- 
ous section is correct, there should be certain ob- 
servations that are concomitant with coarticula- 
tOTy phenomena. For example, if lip rounding is 
anticipated from a rounded vowel (Ai/ for example) 
during the production of a nonlabial ccmsonant 
such as /t/, the tongue body motion and resulting 
configuration for the /u/ should also show is^ome 
affect of the intrusion of the /u/ segment Ihere 
should be an indication that the entire segment 
has blended rather than just a feature (see 
Daniloff lb Hammarfoerg, 1973; Kent & Minifie, 
1977 for reviews). In the present scheme, however, 
the specific coarticulatoiy influences can not be 
entirely predicted without a fundamental descrip- 
tion and understanding of the neuromuscular 
configurations associated with specific vocal tract 
actions. Thin includes some understanding of the 
contribution of the biomechanical periphery and 
the interactions of the anatomical linkages to the 
sculpting of kinematic patterns (Gracco, 1990). In 
the following section, the role of peripheral sen- 
sory information will be considered as a means to 
modify the central motor commands. 

Sensory influences 

An important consideration concerning the sen- 
sorimotor control of speedi is the influence of var- 
ious sensoiy modalities. The specific extent and 
mode of sensory influences on speech motor output 
is still a matter of empirical investigation and 
theoretical contention and is one rrea that is often 
overlooked in speech prodv :tion models. 
Information extracted from the different sensory 
modalities forms the basis for communicative, lin- 
guistic, or sensorimotor adjustments resulting in 
global as well as local effects on speech output. 
There are three sensory channels that have the 



potential to modify speech motor output each in 
overlapping but unique ways; visual, auditory, 
and somatic. During normal speaking situations, 
visual information regarding ones* vocal tract is 
not typically ^available; direct sensorimotor link- 
ages are nonexistent. Rather, visual input is re- 
stricted to information regarding the communica* 
tive environment and provides what can be 
thought of as global influences on the motor con- 
trol process. Faced with an environment that will 
require sound transmission across relatively long 
distances such as a classroom or lecture hall, the 
ou^ut intensity that a speaker uses will be ad- 
justed to assure communincative effectiveness. 
Similarly, speaking to someone who is experienc- 
ing auditory acuity difficulties (temporary or per- 
manent) the speaker may also modify the preci- 
sion of articulatory adjustments to assist the lis- 
tener. In general, visual information does not ap- 
pear to play a significant or consistent role in the 
direct regulation of speech motor output. Rather, 
visual-motor ii^flueuces can be thought of as adap- 
tive and are more likely used for cognitive and 
certain linguistic adjustments affecting certain 
global sensorimotor parameters. 

To evaluate the potential effects of auditory 
input on the motor control process, the auditory 
can be eliminated (temporarily) or distorted in 
various ways. Some useful information has been 
obtained using this kinds of experimental 
approach. For example, long duration exposure to 
high levels of auditory masking (Kelso & Tuller, 
1983; Lane it Tranel, 1971; Ringel & Steer, 1963), 
delayed auditory feedback (Black, 1951; 
Fairbanks, 1965; Zimmermann, Brown, Kelso, 
Hurtig, & Forrest, 1988), and low pass filtering 
(Forrest, Abbas, & Zimmermann, 1986) are some 
of the conditions that can disrupt a subjects* 
auditoiy inpi?t. However, the issue of whether the 
modifications observed reflect the lack of auditory 
information or whether the modifications reflect 
long term exposure to novel feedback conditions 
has not been adequately addressed. Since sensory 
input can have both fadlitatory and inhibitory 
effects on motor output introducing novel 
conditions for extended periods of time may result 
in changes that only indirectly, at best, reflect the 
potential contribution of the sensory modality to 
the normal motor control process. The best 
method for auditory disruption to date has been 
developed by Barlow and Abbs (1978) in wfaidi the 
subjects' own acoustic output (sidetone) is 
unpredictably eliminated for short durations (200 
ms) on a small percentage of experimental trials. 
While such a paradigm does not provide a natural 
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probe into the lyitem opermtioa, it i« mudi lets 
obtrusive than previous tedmiques that suffer 
from potential adaptatioa effects. 

Most resoaithnrs would agree that auditoiy in- 
f onnaiioQ during speedi devek^ment is critical to 
the acquisition of the sound patterns of the lan- 
guage. Long term elimination of auditory infor- 
mation or the ladL of auditoiy information during 
speech development can severely affect the ability 
to ni si«^'^ or acquire speech. As sudi» auditory 
input is consideied instrumental in developing die 
characteristie neuromotor patterns that form the 
basis for the present model. Once acquired, how- 
ever, the potential role of the auditory ^rstem may 
be limited. Even so, auditory information is still 
used in a corrective manner as evidenced by the 
a4justments one makes to slips of the tongue and 
other idnds of speech errors. In terms of on-line 
sensorimotor processes, reduced or distorted audi- 
tory information has been shown to result in 
rather subtle deficits in s p eech output From some 
recent experimental evidence some have sug- 
gested that auditory information mi|^t play a role 
in the ongoing modulation of speedi motor output 
(Barlow & Abbs, 1978; Forrest, Abbas, & 
Zinunermann, 1986; Zimmermann, Brown, Kelso, 
Hurtig, & Forrest, 1988). The dynamic properties 
of the acoustic siffoal can be related in a system- 
atic, albeit nonlinear way, to articulately motion, 
and could conceivably be useful in making predic- 
tive articulatory adiustments. To date, however, 
direct erperimental evidence is limited. 

Early researdi efforts to assess &e potential 
role of somatic sensoiy informatica from skin and 
muscle receptors located throughout the vocal 
tract relied on local or nerve blodc anesthesia to 
eliminate sensory inflow. Results were equivocal 
but suggested to some that somatic sensory 
information, similar to auditory information, may 
play a role in speech acquisition but not in the 
regulation of the speech of adulU (see Borden, 
1979; Gracco & Abbs, 1987; Perkell, 1980 for 
reviews). It is doubtful, however, given the extent 
and degree of sensory innervation in the human 
vocal tract, that somatic sensory information can 
ever be truly eliminated. The lack of significant 
sensoiy reduction effects noted in some studies, 
then, suggests that speedi can be produced, for a 
limited time without the full complement of 
incoming sensory information. This does not 
necessarily indicate that speech is afferent- 
independent, but that speech production is an 
integrated process with distributed and 
overlapping Amotions. Eliminating or reducing 
the contribution of one component of tiie process 



results in other components compensating for the 
loss. 

More recently, med&anical loads unexpectedly 
applied to various articulators have been used to 
evaluate wheAer somatic sensory information is 
important to the ongoing motor control process. 
Thereasoningis that, if sensoiy receptors located 
in various regions of tiie vocal tract are being con- 
tinuously, or quasi-continuoualy, monitored dur- 
ing spealdng, then disrupting articulatory move- 
ment should result in observable compensation. 
Results have dearly shown that somatic sensoiy 
signals have the neoessaxy diaracteristacs to be a 
useful in the on-line control of speech movements. 
Somatic sensoiy adjustments are rapid, usually 
less than a reaction time, and functionally orga- 
nized such that the most directly pertuxbed articu- 
lators provide the mi^or adjustment with sec- 
ondaxy adjustments seen in anatomically remote 
functionally-related articulators. The distributed 
nature of the compensation strongly suggests that 
sensorimotor interactions, in the form of dis- 
tributed synaptic linkages, are a feature of the 
neural organization for speech. Rapid, precise so- 
matotopic and topogr^hic adjustments have, to 
date, only been demonstrated from analysis of me- 
fli^niffttl pertuxbation suggesting a dominant role 
for somatic sensory input in the ongoing modula- 
tion of speech motor output This is not to suggest 
that other seasoiy modalities do not contribute to 
the ongoing sensorimotor control process; rather 
that the ei^Mrimental evidence is lacking. It ap- 
pears that the central nervous system is con- 
stantly receiving information on all phases of 
speedi production and sensory ccmsideraticms are 
as important in understanding motor control as 
perceptual considerations are important for un- 
derstanding action. 

Peihaps the best way to illustrate the manner in 
whidi direct sensoiy information can be used in 
the control of movement is to consider the motor 
task itself. Speaking involves the continuous 
modulation 6t the vocal tract producing local and 
global aerodynamic events structuring the air in 
diaracteristic ways. The spedfic vocal tract con- 
figurations are constantly dianging during speak- 
ing with the same sound exhibiting variable 
movement patterns dependent on, among other 
things, i^onetic context From perturbation stud- 
ies it is known that sensory information from so- 
matic sensory receptors can interact with central 
motor f o mma ndf^ to make short*term (within a 
few hundred milliseconds) and limger term om- 
textual adjustmwts in speedi motor output The 
diaracteristic neuromuscular pattern previously 
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preMntad (Figure 3) can easily be adjusted 
through the vait sensorimotor linkages within 
and among vocal tract structures. As such, so- 
matic seDaoiy input from antecedent articulatoiy 
events can be used to modulate sdect properties of 
the neuromuscular pattern automatically (see 
Gracco, 1987 for discussion). In the case of a /p/ 
preceded by either a low vowel a neutral vowel or 
a hi^ vowel, the oral aperture would reflect dif- 
ferent degrees of qpenness with respect to some 
neutral or reference level. The somatic sensory in- 
put would* based im well established sensorimotor 
linkages, modulate the neuromotor pattern ac- 
cordingly. Beoent experimental resiilts for bilabial 
sounds preceded by high or low vowels are 
consistent with the idea that there is an overall 
modulation of oral closing actions based on oral 
opening considerations (see also Folldns & 
Linville, 1983); an estimation of oral opening can 
be easily obtained from the jaw movement (or 
position) associated with the preceding vowel (cf. 
Gracco, 1887; Gracco, submitted). Further, when 
tbo Oral opening distance is reduced due to a high 
vowel preceding dosure, the upper and lower lip 
closing movements are reduced together 
suggesting that upper and lower lip control 
signals are modulated together. The resultant 
modulatory effects of sensorimotor linkages are 
dependent on a number of factors including the 
parameters of the central activation signals, and 
the strength and sign of the synaptic connectif> ;is 
(the wiring). Sensorimotor interactions with 
characteristic neuromotor patterns provide a 
means to redu'oe the computational requirements 
of contextual variations by providing automatic 
adjustments in the control signals based on the 
conditions at the peripheiy. 

SEQUENCING OF VOCAL TRACT 
ACTIONS 
Speech is more than the specification of 
characteristic motor patterns adjusted for context. 
An in^M>rtant consideration in speedi production 
is the sequencing of vocal tract actions into 
communicatively meaningful units of production. 
While speech is a specialized human iunctiony the 
view taken here is that it is one of many 
important brain functions and any theoretical 
account muat adhere to principles that are shared 
by other similar behaviors. If one accepts the 
premise that the human brain hat( evolved from 
earlier brains, (baaed on the need to predict and 
control species-specific events in the 
environment), then supposing that more complex, 
higher-level behaviors developed from lower level 



related behaviors, within and across species, is a 
logical extension. This is not to suggest that 
speech, locomotion, and handwriting, as examples 
of sequential motor behaviors, share specific motor 
patterns; rather, they may share similar 
medbanisms for their implementation as well as 
adhere to similar organizational principles (see 
Grillner, 1982; Kelso & Tuller, 1984). Common 
organizational principles and sensorimotor 
processea may be used for speech and other motor 
behaviors, although they will be adopted to 
specific task requirements (e.g*, communication) 
and effector properties. Speech and other 
sequential motor behaviors such as typing, 
handwriting, locomotion, mastication, and to a 
lesser extent respiration involve serial ordering of 
muscle actions and movements. For more 
automatic behaviors sudi as mastication and 
locomotion, central rhythm generators have been 
identified which produce behavior-specific 
rhythmic motor ou^ut similar in form and 
function to those identified in lower vertebrates. 
Differences in muscle activity and movement 
patterns for speech, chewing, and respiration 
clearly indicate that the same central pattern 
generator does not underlie all behaviors (Moore, 
Smith, & Ringel, 1988; Smith & Denny, 1990). 

A number of observations, however, are oonsis* 
tent with the presence of some kind of rhythm 
generating mechanism or neural netwoik as the 
basis for sequential speech motor adjustments. 
For exan4>le, compensatory adjustments for lower 
lip perturbations during an oral dosing movement 
demonstrate changes in interarticulator timing 
consistent with the operation of an underlying os- 
dllatoTy or ihythm generating mechanism (Gracco 
& Abbs, 1988; 1989). Spedfically, the timing of the 
oral closing action is advanced (vowel duration is 
shortened) if the perturbation occurs prior to the 
onset of the closing action (Gracoo & Abbs, 1988). 
In a complementary investigation it was also 
found that if a lip perturbation was unexpectedly 
removed well in advance of oral closure, the clos- 
ing action was delayed (vowel duration increased) 
(Gracco k Abbs, 1989). These results are consis- 
tent with a oondusion that phase-related effects of 
sensoiy stimuli, resulting from the pertuifoation, 
interacting with Rythmic motor output to modify 
sequential timing. The qualitative observation of 
spatiotemporal consistent of sequential move- 
ments associated with repeated production of sen- 
tence-length material (see Gracco, 1990) is also 
suggestive on an underlying sequendng mecha- 
nism. Other results such as minimal movement 
durational changes to static (Lindblom, Lubker, 
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Gay, Lyborg, Branderalt & Holgr«n, 1987) and 
dynamic partuibation (Graooo ft Abba, 1988) are 
consiaiant with an undarlying machanism in 
mbidti gfKpmtt^*^ timinc ia ma^^t*^*^ 

Racant a^Mrimttta and thaovatieal por^Mcti V 
on tha naural eontrol of rhyUimic raapiratory 
movamanta offar an intaraating framawork for 
apaadi movamant aaquandng (Faldman, Smith, 
MoCrimmon, EUanbargar, ft Spaek, 1988). It bas 
baan aoggaatad that tba oentral pattam generator 
for raapiration may mora appropriately be 
regarded ai two aaparata, but interacting, 
proceaaaa; one apacifying the pattern of moade 
actions, and one tpadfying tbe timing of the 
output (the rfaydun). A similar scheme can be 
suggested for speech. The characteristic neuromo- 
tor patterns for speedi sounds outlined above in« 
teract with a central rhythm generating proceas 
whidi dictatea the timing of the ou^t (aae also 
Saltxman ft Ifunhall, 1989). Two studies of note 
have attempted to evaluate the apparent 
rhythmidty of speech* Ohala (1975) recorded over 
10,000 jaw movements over a 1.5 hour period of 
oral reading. Although there were fraquendes 
evident from spectral analjrsis in the range of 2-6 
Hz significant variabilis was also observed. In 
contraat, Kelso et al. (1985) reported a rather 
strong periodidty, with little variability, at 
appnndmately 5-6 Hz for lower lip^w movements 
during reiterant speedi. The results of the two 
studies are only contradictory if one assumes that 
context should not interactively affect rhythmic 
output. The Ohala study did not constrain the 
reading material and, hence, reflected a range of 
phonemic content Kelso and colleagues, on the 
other hand, restricted the phonemic content to 
'^a' and It seems more likely, given the 
intrinsic timing character of various sounds, that 
output frequency may be modulated by phonemic 
context; the sounds of the language may have 
their own intrinsic frequency (timing) prcH;>arties 
(cf. Fowler, 1980). For example, vowels can be 
categorized as long or short, generally related to 
their average relative duraticm, and c on sequently 
to different speed and extent of jaw opening 
actions. Similarly, movements of various 
articulators associated with high pressure 
consonants are often produced at a faster rate 
than their voiced low pressure counterparts. As 
shown recently, the oral closing movement is 
initiated sooner with a tendency for hi^er dosing 
movement velocity when the consonant is /jp/ as 
opposed to /b/ or /m/ (Gracoo, submitted). It is 
suggested that a central rhythm generator 



provides the framework for the sequencing of 
sound*specific patterns with contain certain 
intrinsic phonamenqpecific differences resulting in 
^e eontinttous modulation of the basic rhythm. 

An important eonaequanoe of incorporating a 
central rhythm generator into a sp eec h production 
model is the aUlity to explain rate, stress, and 
^ g^^Mwy changea with manipulati(m of a 
single mechanism; glofaml and local changes in the 
frequent of the rhythm. Changes in speaking 
rate can be viewed as an increase in the output of 
the generator, producing characteristic diangea in 
the segments as well as their sequencing. For 
example, increasing the output frequent of the 
generator (increasing speech rata) is accompanied 
by higher amplitude, shorter duration bursts of 
muscle activi^ (aee Figure 4 for example, also 
Gay, UsMiima, Hirose, & C!o<^, 1974; Gay & 
Hirose, 1973) which results in hii^er movement 
velocities, as shown in Figure 4, and a reduction 
in movement displacement (Kdso et al., 1985). 
The reduction in movement displacement is a 
consequence of greater gestural overly) (Browman 
& Goldstein, 1989; Saltzman & Munhall, 1989) 
effectively increasing the damping. Similarly, 
stress and final lengthening can be viewed as a 
local decrease in the output frequency. It U the 
case that phrase-final lengthening and stress 
wipitHW different kinematic effects (see Edwards, 
Beckman, k Fletcher, 1991). However, these may 
merely reflect differences in context such that 
phrase final articulations are less constrained 
because of the relative time between it and the 
next segment, and the movement continues longer 
and farther as a consequence; there is no active 
mechanism to arrest the movement. The 
possibility that a central rhythm generator 
underlies the serial timing is an attractive 
hypothesis that is in need of empirical validation. 

POTENTIAL NEURAL MECHANISMS 
From the previous discussion, it ha£ been sug- 
gested that there are multiple functional pro- 
cesses underlying the generation and sequencing 
of speech movements. These processes include 
phonological (vocal tract) specification, sensorimo- 
tor integration, and sequencing of sound-produc- 
ing elements. A fundamental premise in the pre- 
sent model is that there are characteristic pat- 
terns stored in the nervous system whose selection 
and activation initiate eventa whidi ultimately 
prDduce coordinated aequential vocal tract actions. 
At present any attempt to speculate on where or 
how such patterns are stored would be premature. 
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N=12 



Figuft 4* Avcngtd <n«12) aiuaclc activity for upper lip and lower lip mucko and the annriited upper and lower lip 
closing movemont velodtiie* Subject lepealtd tlte wofd ^aapepplc* at a faet and alow (subject defined) nte. Avenges 
were aligned lo tfie peak jaw opening velocity (not shown). Although tiie peak velocities am higher during the ^t 
rate cenditien^ counpued to the slow nie conditiocv the lesulting dispUcsHMnts are saiaUer. 
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However, it is pottible to coniider the seasorimo- 
tor implementatioa of theee hypothetical patterns 
as well as to geoerally speculate on the contribu- 
tion of various distributed neuroanatoxnical sys- 
tems that are known to be involved in speech pro- 
duction (cf. Abbs, 1986; Gracco & Abbs, 1987; 
Kent, 1990 for reviews). 

In humans* acquired lesions posterior to the 
central sulcus result in a form of fluent aphasia 
diaracterixed by varying degrees of phonological 
impairment (Blumstein, Cooper, Zurif, & 
Caramazxa, 1977; Blumstein, Cooper, Goodglass, 
Statlender, & Gottlieb, 1980; TuHer, 1984). Given 
the large r^resentation of fkdal structures, and 
the projections to supplementary and premotor 
cortices (Petrides & Pandya, 1984; Wiesendanger 
& Weiesendanger, 1984), posterior parietal cortex 
(area 7b), having sensory, motor, and behavioral 
functions (Hyvirinen, T'^Bl; 1982), seems a likely 
candidate for the instantiation of phonological 
goals. As suggested above, it is not dear where the 
phonological specifications are stored, but once re- 
called from memory the posterior parietal region 
may be involved in the setting up of a number of 
neuroanatomical wjutem used for the implemen- 
tation of speedi motor actions. As such, posterior 
parietal and no doubt portions of frontal cortex^ 
are 'Sipstream* from the sensorimotor implemen- 
tation of speedi production and can be viewed as 
performing a prescriptive or executive functi<m. 

In contrast, two nuuor brain systems, involving 
the basal ganglia and supplementary motor area 
(SMA) and the cerebellum and pre-motor area 
(PM), are viewed as the migor implementation 
centers to carry out the details of the speedi pro- 
duction process. The function of the basal ganglia- 
SMA system, surmised from human lesion and 
behaving nonhuman primate studies, appears to 
have ihe requisite function to be involved in scal- 
ing the hypothesized characteristic neuromotor 
patterns in the present model. For example, be- 
havioral data from the human limb studies (see 
Marsden, 1984 for review) and focal stimulation 
and lesion data from behaving nonhuman pri- 
mates in whidi the primary deficit was an inabil- 
ity to scale musde actions (DeLong, Alexander, 
Georgopoulofe, Crutcher, Mitchell, & Ridiardson, 
1984; Horak & Anderson, 1984a4>). SMA lesions 
appear to exaggerate the inaUlity to scale muscle 
actions to task, often resulting in total speedi ar- 
rest (Arseni Botes, 1961; Caplan Zervas, 
1978) and a pronounced reduction in self^initiated 
voluntary movement (see Wiesendanger, 1985 for 
review). Parkinson's disease results in speech 



movement impairments that reflect generalized 
reduction in the speed, and extent of axticulatory 
movements resulting in perceptually distorted 
consonants, slowed speech rate, and a tendenqr 
toward monotone. It is suggested that these 
defidts reflect a generalized reduction in the abil- 
ity to scale musde actions to the spedfic speedi 
movement requirements* Consistent with the lo- 
cation of the basal ganglia upstream frx>m motor 
cortex and tiie relatively indirect access of direct 
sensory information, it is suggested that the neu- 
romuscular •^^^^g operation is controlled by cor- 
tical influence, predominantly the SMA with sec- 
ondary influences from other cortical areas 
(Alexander, DeLong, Strick, 1986). 

Speech movement defidts associated with 
Parkinson's disease do not demonstrate 
impairments in the duration of the individual 
movements (Connor, Abbs, Cole, & Gracco, 1989: 
Forrest et al., 1989) suggesting that the basal 
ganglia is not involved in the sequencing of 
movements. However, aphasic patients with 
anterior cortical lesions and ataxic dysarthrics 
demonstrate a sequencing difficult manifest in 
voice onset timing (see Baum, Bliunstein, Naeser, 
& Palumbo, 1990; Blumstein et aL, 1977; 1980), a 
sequencing difficuity consistent with damage to 
the premotor area whidi receives projections from 
the cerebellum, a neural structure involved in 
timing movement sequences (Kent ft Rosenbed^, 
1982; Gracco & Abbs, 1987; Ito, 1984). Similariy, 
neurophysiological investigations in nonhuman 
primates have shown the PMA to be involved in 
the sensory guidance of movements (Godsdialk, 
Lemon, Nijs, & Kuypers, 1981; Halsband & 
Passingham, 1982; Rizzolatti, Scandolaara, 
Matelli, & Gentilucd, 1981) similar to the iunction 
proposed for the cerd>ellum (Ito, 1984; Soechting, 
Banish, Palminteri, & Terzuolo, 1976). In general, 
the cerebellar-PM ^stem appears to function as 
an important component in the incorporation of 
peripheral sensory signals into the central motor 
commands. 

The final component in the present model is the 
hypothesized central rhythm generator. While 
there is no evidtfioe tiiat the cerebellum is the site 
of a central rhythm generator for any motor 
action, it has been suggested by Ito (1984) that the 
cerebellum may contribute to the timing of many 
rhythmic motor behaviors. The speech timing 
changes associated with cerebellar damage is 
consistent with at least a contributing role. Other 
considerations for the locus of a central rhythm 
generator would l»e the intricate Sjmaptic 
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connections within the brainstem that could 
possibly be temporarily set into oscillation by 
directed input from cortical structures, similar to 
the central masticatory rhythm generator 
(Nakamura, 1986 for example)* An alternate 
possibility is that speedi rhythm and hence serial 
timine is a networic property that emerge from a 
hierarchical organization (Martin, 1972)* It is 
dear that a definitive answer to the presence and 
possible location of a central rhythm genera^r 
underlying speech timing will require a great deal 
more experimental consideration. 

One prediction from the sensorimotor organiza* 
tion presented in the present chapter in whidi the 
vocal tract is considered the smallest functional 
control structure operated on by sensorimotor 
scaling and timing processes is the absence of 
subphonemic speech errors as would occur with 
speech subsystem in^Murment (Abbs, Hunker, & 
Barlow, 1983). Except for cases of focal nervous 
system damage sudi tJ^ a dystonia, or lower mo- 
toneuron damage, speech motor impairments spe- 
cific to an articulatory subsystem should not occur. 
The deficits associated with various nervous 
system damage may result in different degrees of 
impairment because of the biomechanical or phys- 
iological differences of individual articulators. 
However, it is not dear that surface differences 
are a true reflection of underlying differential 
defidts. For a variety of speech motor disorders 
due to damage to basal ganglia, cerebellum and 
anterior and posterior cortical areas, defidts are 
observed that are consistent with a global rather 
than focal breakdown. That is, the migor neu- 
roanatomic sensorimotor systems involved in 
speech production including the basal ganglia- 
supplementary motor system, cerebellar-premotor 
cortical system, and inferior parietal cortex, ap- 
pear to function, not in the control of movement 
per 8e» but in processes from which movement 
emerges. 

SUMMARY 
The framework that emerges from the preceding 
is that speech motor control involves a small 
number of sensorimotor processes applied in a 
unitaiy manner to the vocal tract and modulated 
accoriUng to task requirements sudi as speech 
rate, articulatory predsion, and suprasegmental 
stress. In the current model» these processes 
indude selection find activation of diaracteristic 
vocal tract actions, spatiotemporally scaled 
according to phonological considerations, such as 
intrinsic timing properties, and peripheral 
conditions. Somatic sensory information is an 



important component of the system allowing 
dynamic modulation of relatively stereotypic 
motor commands. An underlying rhythmic 
mechanism is proposed which provides the 
temporal framework for sequential speech 
adjustments as well as a mechanism to 
systematically vary suprasegmental speech 
timing. These fundamental sensorimotor 
processes interact and overlap to produce the con- 
tinuous dynamic modulation of the vocal tract 
generating time-vaiying pressures and flows. An 
important constraint on the model is that the un- 
derlying processes are consistent with generally 
stccepted nervous system operations. An important 
prediction from the model is that nervous system 
damage, unless extremely focal, should produce 
global defidts attributable to one or some combi- 
nation of three nuoor nervous system functions for 
speech; pattern spedfication, scaling of muscle 
actions, and initiation and sequencing of the pro- 
duction units. 
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Analysis of Speech Movements: Practical Considerations 

and Clinical Application 

Vincent L* Gracco 



The instrumental evaluation of ipeech movements is an important adjunct to the 
assessment and tmderstanding of speech motor disorders. As the interface between the 
nervous system and aerod^amic modifications in the vocal tract, movement variables 
such as displacement, \eloci^, acceleration, and their time histories, can provide direct 
information on speech motor disorders that can only be inferred from acoustic or 
perceptual evaluation* Impairment in various aspects of neuromotor functioning is 
reflected in the motion of individual articulators and their coordination, and may reflect 
eariy signs of functional change due to disease or trauma. Vllthin certain limits, movement 
analysis can be used as an objective method for categorizing speech motor disorders and 
monitoring change due to therapeutic intervention. Further, objective comparison of 
orofacial motor behavior during speech and nonspeech tasks may provide diagnostic 
insight into underiying pathophysiological processes. A perspective on the potential utility 
of speech movement analysis in the asseasment, treatment, and understanding of speech 
motor disorders is the focus of the iH'esent chapter. Die limitations of speech movement 
anidysis and the need for elinically-relevant research will be presented. 



INTRODUCTION 

With the increased availability of measurement 
devices for transducing movements of the speech 
articulators, computer software for automated 
processing and analysis of data, and decreased 
cost of computer hardware, instrumental evalua- 
tion of human vocal tract movements is becoming 
more feasible for inclusion into the clinic. Analysis 
of upper and lower limb movements employing 
various instrumental tests have been used for the 
last 40 years to aid in the evaluation and diagno- 
sis of various pathophysiological conditions and to 
determine the outcome of clinical trials (see Potvin 
& Tourtellotte, 1985 for review). For speech, 
movement analysis is a potentially important ad* 
junct to more traditional acoustic and perceptual 
analyses used routinely in the clinic. In addition, 
analysis of speech and nonspeech (orofacial) 
movements can be used to evaluate the 
consequences of motor disorders that have not yet 
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developed to the point of significantly affecting the 
commxinicative process. The purpose of the 
present chapter is to outline some of the ways in 
which analysis of movement parameters and 
movement patterns may be used clinically. Before 
proceeding, it may be helpful to reiterate a point 
made by Potvin and Tourtellotte (1985); 

""To the extent that instrumented tests can be 
developed for measuring functions, their selective use 
can provide inforoiation that might oot otherwise be 
available. However, investigators should be aware 
that the ability to measure small differences reliably 
can yield statistically significant differences that may 
not be of clinical importance 

In the following, the focus will be on measure- 
ments that may have specific functional utility in 
terms of assessing speech production capabilities, 
detecting differences in neurologic function, and 
improving understanding of speech motor perfor- 
mance. Because of the current limitation in nor- 
mative data and the wide range of inter- and in- 
trasubject variability, both qualitative and quanti- 
tative methods will be presented* 
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INTERPRETATION OF MOVEMENT 

The dvaluation of movemeat can he appioacbed 
from a variety of perspectives* From a motor 
control perspective, speedi production is observed 
to be a sequential production of different vocal 
tract configurations that are coordinated in space 
and time and overlap to various degrees. Visual 
inspection of speech movements allows for a 
qualitative impression of overall motor 
functioning* Compare, for example, the lip and 
jaw movement signals presented in the left half of 
Figure 1, obtained from a neurolc^cal nonnal 
subject, with the movement signals in the ri^^t 
half of the figure, obtained from a subject with 
Parkinson's disease (PD). Each subject is 
repeating the same sentence and the scaling for 
the two sets of signals is the same* Without 
knowing what is being said, and disregarding the 



respective acoustic signals, it can be seen that 
there are marked differences in the two sets of 
movements* While there are some general 
similarities in the overall movement patterns, the 
extent of articulator motion of both the upper lip 
and lower lip/jaw movements for the PD subject is 
less than for the normal subject, consistent with 
the clinical manifestations of hypokinesia* 
Movement vek>cities, displayed above and below 
the respective UL and LXJ displacements, are 
severely reduced in magnitude for the PD subject 
as well Further insight can be gained into the 
manifestations of the disorder by evaluating the 
acoustic signal simultaneously with the movement 
signals* The impoverished and slow movements 
from the Parkinson's subject are accompanied by a 
poorly differentiated acoustic signal consistent 
with the perceptual speech characteristics of 
imprecise consonant production. 




Buy Bob by a Poppy 



500 mttc 



Figure 2* Upper lip (UU and lowtr lip^w dLJi novcmcnt dkpUcMmnt and veledty from a ntitrologically nomul 
subject and a subjtct with Psfkinson's disease (PD)* The Sttbjsds task was to lepeat the uHtranct ''Buy bobby a 
poppy^ at a comfoxtabk nte and loudness with even stieas. Shown below eadi set of novenent signals is the 
rtsptctivt acoustic speedi signaL 
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Other qualitative observations can be made 
from movement signals that are important for a 
thorough understanding of the sensorimotor 
breakdown and functional deficits associated with 
particular speech disorders* Based on previous re- 
search it has been shown that multiple articula-* 
tors engaged in the production of the same sound 
display spatial and temporal patterns that reflect 
their cooperative behavior (Gracco, 1988, 1990; 
Gracco & Ldfqvist, 1989). Individual speech 
movements generally display smooth continuous 
motion diaracterized by a unimodal velocity pro* 
file (Gracco & Abbs, 1986; Munhall, Ostry, ft 
Parush, 1985; Nelson, 1983; Ostry, Cooke, ft 
Munhall, 1987). Breakdown in the coordinative 
action of multiple articulators, a loss in the abiUty 
to smoothly sequence concatenated vocal tract 
gestures, or multiple peaks in the velocity profile 
associated with a single articulatory movement 
are observations that reflect qualitatively on the 
processes of speech motor control. From examina- 
tion of discrete events associated with a single 
speech or nonspeech motor task, it is also possible 
to functionally evaluate the neuromotor system at 
the level that reflects on the net force applied to 
articulators to produce individual movements. In 
order to generate movement a certain pattern of 
excitation and inhibition is produced in the ner- 
vous system and directed to the lower motor neu- 
rons. The action potentials generated by the input 
signals result in two distinct peripheral events; 
electrical responses in the muscle membranes 
producing EMG's, and the generation of forces 
originating from the contractile elements of the 
muscles. Movement reflects the summation of net 
active and passive forces with a certain time his- 
tory filtered through the biomechanical properties 
of the structures being moved. If the structure is 
at least in part inertial, the initial acceleration of 
the load will be proportional to the initial contrac- 
tile force. Similarly, the peak velocity of a move- 
ment is generally proportional to the force magni- 
tude integrated over the movement time. 
Inspection of individual movement patterns can 
provide heuristic information regarding the neu- 
romotor functioning of the patient and reflect on 
the mechanical characteristics of particular 
articulators. 

BASIC KINEMATICS 
In order to objectively and quantitatively evalu- 
ate speech movements a measurement framework 
is required. Any description of movement relies on 
the terminology of kinematics. A complete kine- 
matic description of any movement, especially of 



the vocal tract, is geometrically complex. For most 
purposes, the motion of bodies can be reduced 
from irregular shaped masses to points, and the 
motion of such points can be described with kine- 
matic variables. The description of point motion is 
analytically complex, requiring 15 data variables 
whidi diange over time (Winter, 1979). For cUni- 
cal purposes, the displacement (the distance from 
a starting to an ending position) and velocity (the 
directional speed) are the most useful for describ- 
ing articulately motion. In order to keep track of 
the changing kinematic variables and maxin[iize 
their descriptive usefulness it is important to 
adopt a reference convention and a coordinate sys- 
tem. Motion can be described relative to some 
static articulatory position, such as Hp movement 
relative to a rest position. An alternative that also 
provides spatial information is to reference the 
movements to an immobile anatomical structure. 
The most frequently used spatial coordinate sys- 
tem involves three perpendicular axes represent- 
ing the sagittal, frontal, and transverse planes. 
Movements of articulators can then be described 
with respect to inferior-superior (y), anterior-pos- 
terior (x), and lateral-medial (z) directions, re- 
spectively, relative to some anatomical reference. 
The most important consideration for clinical use 
is that a convention be established, one that is 
consistent with respect to the purpose of the mea- 
surement and reproducible within and across 
subjects. 

As mentioned, the displacement of a point on an 
articulator surface and the velocity at which the 
articulator moves are two important kinematic 
variables fundamental to the description and 
evaluation of motor disorders characterized by 
hypokinesia (reduction in movement extent), 
bradykinesia (slowness in movement; reduced ve- 
locity), and akinesia (slowness in movement initi- 
ation). Shown on the left in Figure 2 is a position 
time history of a single midsagittal point on the 
lower lip as it moves from opening for a vowel to 
oral closure for /p/. Under the displacement signal 
is the time histoiy of the instantaneous velocity 
mathematically derived from the displacement 
signal. From the displayed signals, the maximum 
displacement, calculated as the distance between 
onset position and offset position associated with 
the movement, and the associated peak instanta- 
neous velocity, are easily obtained. Additionally, 
the duration of the movement, defined as the time 
from onset to completion can also be obtained. As 
shown in the figure, the velocity profile can be fur- 
ther dissected to provide information on the accel- 
erative and decelerative phases of the movement. 
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Ignoring gravity, the acc#l«rative phaie of a 
movemttnt geoenJly rtfleete the increase in net 
force applied to tlie load (articolator) due to Hkt 
contraction of the suiadee. In eontrast, the decel- 
erative phase criT a movement generally reflects the 
decrease in net force acting on the load due to the 
relaxation of the contractile pi'ocess and any an- 
tagonistic muscle actions. Displacement and 



velocity measures provide the means to describe 
and quantify movement and also allow some in* 
ference on the properties of the muscular actions 
that caused the motion. In addition to measuring 
the discrete components of a movement, the fre- 
quency and amplitude of repeated productio n s can 
also be ralailfltft^^ as illustrated on the right side 
of Figure 2. 



Peak 



Displaceme 
LL 



Velocity ^ 





Onset 



Displacement 
Peak position - onset position 



Frequency = 1 /cycle duration 



Fixure 2. RtpiessnUdon of the disfrfacemsnt and velocity of s paint en the lower lip sisodstMl with s smgle oial 
closmgniovsmsnt for /p/ (kft h«id portion of the fi^iirt). Shown aie some of ths vsrisbles to be measuied (see text for 
further dttaiif). The displacement and velocity ef tfie same point en the lower lip during repetitive opening and 
closing movtments asseciatsd with lepetition ef /pas/. Fran ripttitive syllables, the fiequcncy of production can be 
dtrived as shown. 
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INSTRUMENTATION 
Prior to presenting a protocol that we have been 
using to instraxnentally evaluate speech and non- 
speech movements^ a brief discussion of the 
movement transduction devices and general oper- 
ating principles follows. Monitoring upper articu- 
lator movement can be accomplished using a va- 
riety of transduction techniques. In general, these 
techniques convert mechanical energy, repre- 
sented as movement of an articulator or group of 
articulators, to electrical energy, represented as 
an analog volt::ge. Many methods are available to 
convert a physiological event to an electrical signal 
and generally involve direct or indirect variation 
in electrical quantities such as resistance, ca- 
pacitance, inductance, or the magnetic linkage be- 
tween coils. The four basic techniques currently 
available in different forms for use in the speech 
clinic involve strain gauge transduction, optical 
transduction (optoelectronic sensing devices), 
imaging (xiltrasound), and electromagnetic trans- 
duction. The following will briefly review the 
techniques and commercially available devices 
with respect to their basic principles of operation, 
clinical utility, and practical limitations. A more 
detailed analysis can be obtained from various 
sources such as Abbs and Watkin (1976), Baken 
(1987), and Geddes and Baker (1968). 

Strain gauge transduction 

Strain gauges are resistive elements that are 
moxmted on a flexible, lightweight strip of metal 
anchored at one end and attached to a moving 
surface on the other end. The voltage output from 
a gauge is proportional to the movement at the 
end of the mobile attachment. Strain gauge 
transducers are used for monitoring external 
articulatoiy movements such as the lips and jaw. 
Initially, the technique was used in the 
transduction of jaw and lip movements by 
Sussman and Smith (1970a, b). Refinements of the 
method of attachment have been reported by Abbs 
and Gilbert (1973) and Miiller and Abbs (1979). A 
significant clinical development was reported by 
Barlow, Cole, and Abbs (1983) in which strain 
gauge transducers were attached to a lightweight 
aluminum frame which could be mounted to a 
subjects head. This refinement allowed the 
monitoring of lip and jaw movement without 
requiring stabilization of the subjects' head; for 
many neurological patients, head stabilization is 
an unacceptable condition. The cantilever beams 
can be instrumented to sense motion in one or two 
(orthogonal) dimensions, although the two 
dimensional units and their attachments add 



significantly to the overall weight and can 
decrease stability. The cantilever beams are 
commonly attached to a point on the midsagittal 
plane (midpoint of the Ups and chin) providing 
inferior-superior and anterior-posterior motion 
sensing. Strain gauge transducers provide a 
continuous analog output that can faithfully 
reproduce the fastest lip and jaw movements. A 
bridge amplifier is required for each direction of 
movement to supply an excitation voltage to the 
resistive elements and to amplify the signal prior 
to storage or analog-to-digital (A^) conversion. 

Optical transduction 

The most notable optical technique for tracking 
human movement involves a position sensing de- 
vice and pulsed light*emitting diodes to track 
points in a two or three dimensional coordinate 
system (V/atsmart, Northern Digital, Inc., of 
Waterloo, Ontario, Canada; Selspot, Selective 
Electronics, Inc., of Sweden). Devices that rely on 
the sensiog of LBIVs are limited in a similar man- 
ner to the strain gauge devices in that they can 
only be used to monitor the external articulators 
sudi as the lips and jaw. There are some photo- 
electric devices that rely on the sensing of Ught 
reflection which can be used to monitor tongue 
movement (Chuang & Wang, 1978; Fletcher, 
1982). However^ such optical scanning systems for 
tongue motion require small LED light sources 
and photosensitive detectors arranged in an artifi- 
cial palate worn by the patient. In addition to this 
practical limitation and the lack of commercial 
availability, a distance dependent error has been 
reported requiring a refinement in calibration 
procedures (McCutcheon, Lakshminarayanan, & 
Fletcher, 1990). A final device, using charge cou- 
pled device (CCD) sensors eliminating reflection 
errors, is currently being marketed (Optotrak, 
Northern Digital, Inc.). Similar to the optoelectric 
devices, the CCD device provides three dimen- 
sional information on the movement of visible 
sensors with 0.1 mm accuracy over a one cubic 
meter volume. These commercial devices can also 
be purchased with customized software for analog- 
to-digital conversion, signal processing and auto- 
mated analysis. The most significant drawback to 
these systems is the cost which may be as high as 
$50,000 to $60,000 for a complete three dimen- 
sional acquisition and analysis system. 

Imaging 

The most common imaging device having 
potential clinical application is ultrasound (see 
Sonies, 1982 for review). An ultrasound signal is 
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passed into the body end the diffeTential tissue 
properties associated with difiPerent stnietaral 
layers provide different reflections to the 
generated sound. Ibe ultrasound reflections are a 
series of edioes that can then be detected by the 
transducer. The longer the echoes take to be 
reflected, the further the tissue is away from the 
source* Throui^ ahnowledge of the anatomy and 
the different transmission times, the structures 
within the path of the ultrasound can be 
reconstructed. For the human vocal tract, 
tiltrasound can be used to visualixe and trade 
motion of soft tissue structures sudi as the tongue 
and vocal folds. Anumber of researdi studies have 
employed ultrasound to evaluate the shape and 
motion of the tongue (Sonies, Shawker, Hall, & 
Gerber, 1981; Stone, Morish, Sonies, k Shawker, 
1987; Stone, Shawker, Talbot, k Rich, 1988), the 
movement of the tongue dorsum during speech 
(Keller Ostiy, 1983), movement of the vocal 
folds during devoidng (Munhall ft Ostry, 1985) 
and tongue motion during swallowing (Stone ft 
Shawker, 1986). While ultrasound devices are 
commercially available they are costly and are 
often not optimized for vocal tract use. 

Electromagnetic tnmsduction 

Using alternating magnetic fields it is possible 
to track point movement of small transducers 
placed on the tongue, lips, velum, and jaw in the 
midsagittal plane. The basic device employs a si- 
nusoidal signal driving a transmitter coil which 
produces lines of magnetic flux. Small receiver 
coils, or transducers, moving throu^ the mag- 
netic field are induced with a signal that is pro- 
portional to the effective cross-sectional area of 
the receiver coil and the flux density. If the 
transmitter and receiver axes are parallel, the 
magnitude of the induced signal is a measure of 
the distance between Uie transmitter and receiver. 
Recently, a commerdally available electromag- 
netic system for traddng movements of the upper 
articulators has been developed and marketed un- 
der the name of the Articulograph AGIOO 
(Carstens Medizinelektronik, Gdttingen, West 
Germany). This system allows the traddng of up 
to five small receiver coils placed on various 
supraglottal articulatory structures in the mid- 
sagittal plane. The transmitter assembly is placed 
on the sul^ects head and secured in a manner 
similar to the head mounted movement system 
developed by Barlow et al. (1983). Althou^ the 
system is commerdally available, development 
and refinement is continuing (see Tuller, Shao, 
and Kelso, 1990 for initial evaluation of system 



performance). The system requires a microcom- 
puter to calculate the x-y positions of eadi trans- 
ducer in real time and stores the data on the com- 
puter disk. Software routinee are provided for 
data display and an^dysis. Cost of the system, in- 
cluding a microcomputer, is approximately 
$i2,000. Other magnetic devices are commerdally 
available to record positions and movements of the 
mandiUe and the interested reader is referred to 
an artide by llidiler, Bakke, and Meller (1987) 
for further information. 

Ihere are a varied of commercial devices for the 
transduction speech movements, ead& with cer- 
tain strtfigths and weaknesses. The optoelectric 
devices are capable of three dimensional motion 
tracking and provide sophisticated software for 
analysis; the mioor limitation is the cost The 
headmounted movement system is a low cost al- 
ternative that can be used with children and 
adults. The system can be configured to allow 
transduction in two dimensions although some 
proUems may arise due to the extra weight of the 
transducer unit. Ultrasound and the 
Articulograph are the only devices available that 
allow transduction of tongue movements. Similar 
to the optoelectric devices, the cost of the respec- 
tive equipment is hi|^. For all devices, a certain 
amount <^ tedmical sophistication and a basic un- 
derstanding of the operating prindples is re- 
quired. A final consideration is the transduction of 
lower lip and jaw movement. The movement 
transduced at the lower lip is actually a combina- 
tion of lower lip and jaw movement. In order to 
evaluate the separate lower lip and jaw actions 
during speedi or nonspeech movements, both the 
jaw and lower lip and jaw movements are ac- 
quired. Hie jaw signal is then subtracted from the 
lower lip^aw signal yialding net lower lip move- 
ment. Using the magnetic device, a transducer coil 
placed on the midpoint between the lower central 
indsors, can be used as a reflection of ^rue* jaw 
motion. For the optical devices, a custom fitted 
jaw splint can be used with an additional light 
emitting diode used to track jaw motion. While it 
is posdble to obtain jaw m^ement from a sensing 
device placed on the diin, such placement may re- 
sult in skin movement artifact (see Kuehn, Reidi, 
ft Jordan, 1980). For most clinical applications, 
the combined movement of the lower lip and jaw 
may suffice, i^l^tninating the need to factor out the 
contributions of the two articulators. 

Other conaiderations 

Once obtained, the data must be stored in some 
form for analysis. The storage device may be an 
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oscillographic recorder with a paper medium^ an 
FM tape recorder, or the signals may be digitized 
directly to computer disk. Data converted from 
analog to digital form requires anti-aliasing 
filtering prior to conversion. The general function 
of anti-aliasing is to insure that false frequencies 
not present in the original signal are not intro- 
duced into the digitized signal. In order to avoid 
aliasing, the analog signal must be filtered and 
then digitized at a rate that is at least twice the 
cutoff frequency of the anti-aliasing filter. The 
minimum sampling rate is known as the N7quist 
rate and is calculated by doubling the hi^est fre- 
quency contained in the signal of interest. Since 
speech movemoits contain mostly low frequencies 
(generally below 15 Hz), the Nyquist rate could be 
as low as 30 Hz with the anti-aliasing Gow pass) 
filter having a cut off frequency at 15 Hz. 
However, a 30 Hz sampling rate provides a poor 
quality time display wiUi a point sampled only ev- 
ery 33.3 ms. (A movement that lasts approxi- 
mately 120 ms would be represented by only 4 
points.) In order to improve the temporal quality, 
also important when deriving the velocity of the 
movement, higher sampling rates are often used. 
An additional consideration is that hardware 
filters create phase delays in the signal which 
vary as a function of the cut off frequency. 
Therefore, it is generally desirable to use an anti 
aliasing filter with as high a cut off frequency as 
possible. Once digitized, the movexrent signals 
may be further smoothed in software to eliminate 
any noise in the signal. Using digital filters time 
delays can be eliminated and the signal can be 
filtered at a much lower frequency. Similarly, 
software differentiation (central difference algo- 
rithm) is the preferred method of obtaining first 
and second derivatives since it does not introduce 
time distortions to the signal. 

MOVEMENT ANALYSIS 
Most movement disorders result in a reduction 
in movement extent (hypokinesia), speed 
(bradykinesia), a slowness in initiation (akinesia), 
or become generally dyscoordinated. Eadi of these 
clinical signs can be evaluated kinematically and 
subsequently quantified for intrasubject compar- 
isons. We have recently been using a limited 
speech and oral motor inventory with subjects 
having various movement disorders focusing on 
movements of the lips and jaw. Subjects are re- 
quested to produce syllables and nonspeech ges- 
tures at two rates; a comfortable (preferred) and 
maximal rate. Words and sentences are also re- 
peated at a comfortable rate and are used for both 



qualitative and quantitative examination. 
Nonspeech movements are used to evaluate the 
orofacial motor system to determine the extent of 
neuromuscular involvement. It is felt that this 
protocol provides the minimal amount of informa- 
tion necessary to understand the functional and 
structural changes accompanying many motor 
disorders. In the following, movement data for a 
portion of the protocol will be presented from two 
sulgects, both with PD, who have different degrees 
of speech motor impairment. Subject one (SI) has 
minimal speech motor involvement while subject 
two (S2) has a moderately severe dysarthria char- 
acterized by imprecise consonants. Motion of the 
upper lip and lower lip^aw were transduced using 
a head mounted movement system (Barlow et al., 
1983) instrumented with strain gauges aligned for 
two dimensional sensing. The head moiinted 
frame was oriented such that inferior-superior 
and anterior-posterior movements were referenced 
to the Frankfort plane. 

An initial step in the analysis involves 
examination of some of the data in two 
dimensional space. Shown in Figure 3 is the path 
of the jaw in x-y space, with anterior-posterior 
movements represented on the x axis and inferior- 
superior movements represented on the y axis, for 
a series of speech and nonspeech opening and 
closing movements. The subject produced repet- 
itive opening and closing movements of the lip^aw 
and repeated the syllable /sa/ for approximately 5 
seconds the two rates; a comfortable (preferred) 
and fast (maximal) rate. A ntunber of observations 
can be made from the x-y representation. First, 
the increase in speed required for the fast rates 
results in a general reduction in the movement 
extent for each task. Second, the extent of 
movement for /sa/ repetitions is less than that for 
opening and closing the mouth and the /sa/ 
repetitions are produced in the middle two thirds 
of the space occupied by the opening and closing 
nonspeech movements. Finally, the path taken by 
the jaw in both tasks and conditions is essentially 
straight and smooth. These observations from an 
individual with Parkinson's disease are qualita- 
tively similar to those made for normal subjects. 
Figure 4, in contrast, displays similar data 
obtained from S2. As mentioned, this subjects' 
speech motor skills are more severely affected 
than the previous subject From the x-y represen- 
tations of the speech and nonspeech movements 
it can be seen that the lip/jaw movements 
are reduced in extent, less smooth, and more 
variable than was observed in the previous figure 
(note the different scales for the two figures). 
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Figure 3. Two dimtnsional mwmmt of tli« lower lip^w for rtpctitivc proditcdona of onl opcning^dosing 
(nonspMdi) and /saa/ for SI (mo taxt)* MovanonI dinctiona aa indkatad* 



Superior 



inlarlor 




Normal Rate 



Suparlor 



Inttrior 




Pottartor Anterior Pottartor AntorkK 



Figure 4. Two diaianalonal novamanl of lha lowar iip/jaw for rtpadtivt productiona of oral opaning/doaing 
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In order to evaliiate such data quantitatively the 
time hifitorieB of the movements must be 
displayed. Presented in Figure 5 are examples 
from the two subjects of continuous opening 
and closing inferior-superior movements 
(nonspeech) of the LLJ. The well defined peaks 
and valleys in the displacement trace provides a 
way of automatically identifying the different 
movement phases (opening-closing) and cal- 
culating the displacement and frequency of 
repetition* Below each trace is the summary of a 
software routine which identifies the peaks 
and valleys in the displacement trace and 
calculates the frequency of repetition (FO), and the 
average displacement (mm) of the sequential 
movements. 

Shown in Figure 6 are the upper lip and lower 
lip/jaw movements in the x and y dimensions 
associated with repeated production of the syllable 
/pae/. It can be seen that the upper and lower lips 
move in both a superior-inferior and anterior- 
posterior direction. The movements are generally 
smooth and regular, and the upper lip moves less 
in extent than the lower lip. Shown in the next 
figure (Figure 7) are examples from the two 
subjects illustrating the results of the automated 



analysis routine applied to the displacement 
traces. Average movement displacement and the 
frequency of production at each rate was 
calculated from the inferior-superior movement of 
the lower lip^aw. Subjects repeated the syllables 
at a comfortable or preferred rate and as fast as 
possible for approximately six seconds. The peaks 
and valleys in the displacement signals are 
indicated by the vertical ticks above the traces 
and the summary measures were calculated as 
shown under each trace. From these results it can 
be seen that the lower lip^aw movement for the 
more severe subject (S2) displays a smaller 
movement displacement compared to the less 
impaired subgect (SI) although the preferred rate 
of repetition is approximately equivalent (2.9 vs. 
2.8 Hz). At the fast rate the less severe subject 
(SI) is able to increase the frequency of production 
(2.8 to 5.4 Hz; 93% increase) with a concomitant 
reduction in ihe movement displacement (8.0 to 
5.6 mm). In contrast, S2 is unable to increase the 
frequency of syllable repetitions to the same 
degree (2.9 to 3.5 Hz; 20% increase). In addition to 
measuring the movement displacement and 
frequenf^y, similar measures can be made on the 
derived velocity time histories. 
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Figure 5. Opening and dosing lower Up/jaw movements in the Inf erior-fuperior direction for SI and S2. Peak centered 
inf oimation if displayed under each trace. 
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A compariBon of speech and nonspeech 
movement tasks is presented in the next two 
figures (Figures 8 and 9). These data were 
obtained from the same two subjects presented in 
the previous figure. In eadi case the subject^s task 
was to purse and retract the lips and to repeat the 
vowel sequence ^u*-^ee»'' at comfortable and fast 
rates. Because these movements are predom- 
inantly produced with anterior-posterior 
movements of the lips^ only the anterior-posterior 
movements were measured. For SI (Figure 8), 
both lips appear to be moving together (in phase) 
for all tasks. The consistency of the timing 
relations can be easily calculated using cross 
correlation. The nonspeech task (purse-retract) is 
not constrained by phonetic requirements and 
allows a more detailed evaluation of orofacial 
mobility. For this subject the nonspeech task is 



accomplished by equivalent contributions of the 
upper and lower lips. In contrast, ^uu/ee** 
r^titions predominantly involve lower lip action. 
The firaquency of both the speech and nonspeech 
tasks increase in the fast rate condition, although 
the nonspeech tasks demonstrates a greater 
degree of change. Results from the more severely 
involved suI:oect (S2) are presented in Figure 9. 
For this subject, the rate changes are much less 
noticeable with tiie nonspeech task demonstrating 
a greater degree of impairment than was noted in 
the speedi task. In addition, the nonspeech task 
was apparently difficult for S2 who demonstrates 
slow and labored protrusion and retraction of 
the lips. There is also some indication of a 
dyscoordination of the upper and lower lip 
movements at the faster rate during the speech 
task. 




Figure 8. Two different repetitive tasks involving ]/redofninant!y anterior-posterior (x) motion of the UL and LLJ for 
SI. Shown arc Ac position time histories for alternating and continuous pursing and retracting and alternating vowel 
production ^uu««e'' at preferred and maximally fast rates. Below eadi panel is the average fmptency of production. 
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Figure 9. The uiM i«p«titiv€ tMl» m in Hgurt 7 invoMng pftdoadnanOy ante ii o r -p o l Mior <x) moBon of lh« UL and 
LLJ for S2. Shown en potitim time hIitodM for aUcnating and continMW pwsing and rttncting and alternating 
and cmtinuoua vow«l pfodiidion '««ihm' at pnfcmd and maxiMlty laat niaa. Balow aadi pantl ia the avaiagc 
frequency of pioductimi except for the pmsa/ietract taak becauae of the alowmaaa af pioduction. 



Oiher applications 

There are additional applications in which 
movement tranaduction and analyaia can used in 
the clinical evaluation of movement disorders. 
Instrumental testa can be uaed to provide 
information on the reaction time, apeed, and 
visuomotor integrative abilities of the patient In 
simple reaction time, the delay from the 
presentation of an auditory or visual stimulus to 
the onset of some response is measured. If the 
response involves movement to a target, auch as 
closing the lips, the movement time can also be 
measured. Reaction time aiui movement time can 
be differentially affected in certain disorders audi 
as Parkinsonism (Evarts, Teriv&innen, & Calne, 
1981) and provide a means to objectively aaaeaa 
akinesia and bradykineda, respectively during a 
nonspeech task. Tracking tests reqiuire the subject 
to follow a moving target with the output of a 
transducer attadied to one <tf the articulaiora (see 
McClean, Beukelman, k Yorkston, 1987 for 
application to eompoaents of the speech motor 



system). Clinical applications usually involve 
scoring techniques which reflect the magnitude of 
the error between the target and the patients 
output Sudi tests have been us^ul in evaluating 
ataxia or characterizing the impairment in 
producing smooth continuous motion of an 
effector. While not directly applicable to the 
perceptual deficits associated with speech motor 
disorders, these nonapeech reaulta may prove 
uaeful in understanding the neurological condi- 
tion, aspects of which may be masked by compen- 
satory behavior of the i^atient These novel tech- 
niques have been used in evaluating limb impair- 
ments associated with a variety of neurological 
disorders. The reader is referred to Potvin and 
Tourtellotte (1985) for an extensive compil/ition of 
measures and references* 

RESEARCH NEEDS 

In attempting to provide a quantitative basis for 
the evaluation of speedi movements, two needs 
are obvious; the need for standardisation and 



66 



Analysis of Speech Movements: Pnetical Considerttiions and Oinical Apfiication 



57 



normative data bases. All measures that have 
been described or implicated can be used to objec- 
tively monitor gubject performance and evaluate 
disease progression or improvement due to thera- 
peutic intervention. However, diagnostically, such 
measures have limited utilil^ due to; 1) the lack of 
norms currently available, 2) solid correlational 
studies which attempt to relate kinematic diarac- 
teristics with disease states or severity of in- 
volvement, and 3) technical standardization to al- 
low valid intersubject comparisons. However, it 
may be the case that norms, while useful, may 
prove to be relatively uninformative or even mis- 
leading due to the range of variability in the nor- 
mal population. This is not to suggest that norma- 
tive data are not necessary. Rather, it may be 
more important to realize that speech movement 
data should not be evaluated in isolation without 
considering concomitant acoustic and perceptual 
characteristics of the disordered speech as well as 
overall motor and sensory performance levels. 
Only through a synthesis of observations can we 
hope to imderstand the communicative breakdown 
that is often interleaved with a more general sen- 
sorimotor deficit due to damage to the nervous 
system or modifications in nervous system 
operation. 

CONCLUSIONS 

Movement analysis is an objective and quantita- 
tive method of describing the behavior of the oro- 
facial system during speech and nonspeech tasks. 
Evaluation of speech movement characteristics 
verify, refine, and extend, inferences and observa- 
tions based on acoustic, aerodynamic, or auditory 
perceptual analyses. Both speech and nonspeech 
movements provide important information on the 
neuromotor functioning of the patient and facili- 
tate assessment of disease states. Further, infor- 
mation related to the movement impairment can 
be easily assimilated by members of an interdisd- 
plinary rehabilitation team. Quantitatively, 
movement transduction provides a reliable esti- 
mate of motor performance and can objectively 
monitor changes in performance associated vdth 
various forms of therapeutic intervention or 
changes in disease state. An improved under- 
standing of movement deficits that underlie a 
specific motor disorder may lead to the develop- 
ment of novel treatment approachies that might 
not otherwise be considered. 
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Reiterant Speech as a Test of Nonnative Speakers'^ Mastery 

of the Timing of French* 

Andrea Levittt 



The leiterant speech of ten native speakers of French was anal3^zed to devdop baseline 
measuies for syllable and oonsonant/vowd tuning lor a sedes of two-, tiuee-, four-, and five- 
syllable French words spcktn in isdation. Ten native q>6akers of English, who learned French 
as a second language, produced reiterant versions of botii French words and a comparable 
set of English words. The lutive ^Mkcrs of English were divkM 

of their second language e39>erience. The fint grotq> consisted of four univerBity-level teachers, 
who were relatively experienced learners of French, and second gioi^ of six less 
experienced learners of Frandi. The French reiterant imitations of the two grotqw of native 
speakers of English were con^Mured to the lutive Irntdi ^Makers' productions. The timing 
patterns of the experienced grot^ of nonnfiative ^Makers did not differ significantly from 
those of the native Frcndt tfmaktn, whereas titere was a significant difference between these 
two groupie and the group of six lees experienced second-language learners. Deviations from 
the Frendi baseline measures produced by the less experienced group are discussed in terms 
of the influence of the timing patterns of English^and the literature on a sensitive period for 
second language acquisition. 



INTRODUCnON 

Although considerable research shows that na- 
tive language phonetic habits influence second 
language production'^ even for experienced sec- 
ond-language speakers (see Flege^ 1986, for an ex- 
tensive review), little work has been done on the 
influence of first langiuige timing patterns on sec- 
ond language rhythmic patterns. One sudi study 
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(Weak, 1985) found an influence of native French 
rhythmic patterns on the timing of English as a 
second language. However, the effect of English 
timing patterns on the acquisition of French has 
not bean directly tasted. 

The use of reiterant speech to test for such 
influence presents several advantages. In 
reiterant speech studies » subjects are asked to 
substitute a single syllable, often /ma/, for each of 
the original syllables in a word or sentence. 
Acoustic and perceptual analyses of reiterant 
speech have shown that it preserves the prosodic 
characteristics of the original utterance (Larkey, 
1983; Liberman & Streeter, 1978; Nakatani, 
0*Connor, & Aston, 1981; Oiler, 1973). 
Furthermore, because measurements of segment 
and syllable durations are easy with reiterant 
speech and are generally unconfounded by 
segmental variation, many studies have used such 
duration measurements in English for analyzing 
rhythm (e.g., Nakatani et aL, 1981), for studying 
the perceptual effects of timing variations 
(Larkey, 1983; Nakatani & Schaffer, 1978), and 
especially for determining how durations vary as a 
function of utterance position and stress (e.g., 
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oiler, 1973), Reiterant ipMch duration 
measuramenU hava also bean made on Swediih 

(e.g., lindblom & Rapp, 1973), and compariioni of 
the rhythmic featores ot* a group of languages 
have been made on the basis of reiterant speech 
(Hoequist, 1983; Vatikiotis-Bateson, 1986). 
However, very little work has been done with 
reiterant speech on the ifaythmic features of 
French, aside from that done hy Vatikiotis- 
Bateson (1986), where reiterant speach was used 
to determine universal and language-specific 
effects on articulator timing in native speakers 
from a group of languages* The use of reiterant 
speech as a means of testing a non-native 
speaker^s mastery of the timing patterns of a 
foreign language has not been previously 
attempted. In learning a second language, 
speakers need to learn new timing patterns for 
individual segments, often as a function of context 
(Mack, 1982), as well as new rhythmic patterns. 
Reiterant speech is particularly well suited to 
testing the acquisition of new rhythmic patterns 
independently from the effects of timing for non- 
native segments. 

The speech rhythm of French and that of 
English are quite distinct. French has been tra- 
ditionally classified as a ""syllable-timed* language 
(e.g.. Pike, 1945), with syllables essentially equal 
in length. This diaracterization of French rhythm 
has been criticized (e^g,, Dauer, 1983; Fletcher, 
1991; Wenk & Wioland, 1982) for failing to 
recognize the important final-syllable l en gt h e ning 
that is characteristic of French rhythmic groups, 
which may be either the individual ^sense groups' 
of a French sentence or individual French words 
spoken in isolation. Thus, nonfinal syllables 
within unemphatic French rhjrthmic groups are, 
except for effects of phonetic variation, essentially 
equal in length, whereas final syllables show 
considerable lengthening. English, on the other 
hand, has been traditionally classified as a ^stress- 
timed" language (e.g.. Pike, 1946), Because of 
variable word itress, any English sentence 
presents a series of stressed syllables which 
alternate with unstressed syllables. A stress- 
timed language is supposed to maintain equal 
intervals between stressed syllables. Thus, if an 
interval between two stressed syllables contains 
more unstressed syllables than another, those 
unstressed syllables should show relatively 
greater compression. English also exhibits 
characteristic patterns of final-syllable 
lengthening, including word-final, phrase-final, 
and utterance-final lengthening (OUer, 1973). 



Although the characterization of English as a 
^stress-timed* language has also been criticized 
(e«., Dauer, 1983; Wenk k Wioland, 1982), its 
rhythmic pattern is nonetheless quite different 
from that of French, especially in two salient re- 
spects. First, in English, nonfinal syllables will 
vary in length as a function of stress, whereas in 
unemphatic French, nonfinal syllables within a 
fhytiimic group are essentially equal in length. 
Second, althou^ both languages exhibit final- 
syllable lengthening, the magnitude of the final- 
syllable lengthening effect and its location both 
vary. Thus, the magnitude of utterance-final 
lengthening is greater in French than in English 
(Delattre, 1966). In addition, in English, utter- 
ance-final lengthening appears to be greater than, 
phrase- or word-final lengthening (e.g., Oiler, 
1973). A similar difference in the magnitude of 
final-syllable lengthening has been observed for 
uttaianoe-final oon^>ared to phrase-final lengthen- 
ing in Frendi (Benguerel, 1971; Fletcher, 1991; 
but cf. Allen, 1973), but not for words. French 
words exhibit final lengthening only at the ends of 
rhythm groups or whtti uttered in isolation. 

Which of these rhythmic differences are second- 
language learners of French likely to master first? 
On the one hand, since both languages exhibit 
final-syllable lengthening, English-speaking 
learners of French might find it easier to adjust 
the magnitude of such lengthening as they acquire 
the rhythm of French. On the other hand, Flege 
(e.g., Flege, 1981; Flege, 1987; Flege & 
Hillenbrand, 1984) has proposed that second- 
language learners are more likely to master the 
totally new phonetic features of a second language 
than those thi^^ can be assimilated to their native 
repertoire. In that case, English-speaking learners 
of French might find it easier to acquire the 
relatively equal timing of nonfinal syllables in 
French, which is not found in English. 

In order to conduct a test of the acquisition of 
French rhythmic patterns by native speakers of 
English, it is first nece^saiy to establish baseline 
measures for timing patterns in French using the 
reiterant productions of native speakers of 
French. Not all speakers are equally good at 
producing reiterant speech that preserves the 
timing of the original utterance (Larkey, 1983). 
Thus, it is important that the baseline measures 
be based on the fluent productions of the best 
reiterant speakers. Once these measures have 
been established, they can be compared to 
published findings about the durations of 
consonants, vowels, and syllables in French. 
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Experimfint I reports the reiulti of an experiment 
designed to produce luch data. 

We may then aik how well non*native speakers 
of French match the timing patterns ci the native 
French productions. In Experiment II, reiterant 
versions of both Frendi and English words made 
by native speakers of English were analysed in 
order to establish a similar set of baseline 
measures for reiterant English, to determine how 
well the non^native speakers of Frendi diflPering in 
degree of experience with the language match the 
timing of the productions of the French speakers, 
and to see whether any deviations from the 
French baseline measure stem firom the influence 
of English timing patterns. 

1. EXPERIMENT 1 

A. Subjects. Ten subjects, five male and five 
female, participated in the study. All were native 
speakers of French from the Paris region. All of 
the subjects have advanced graduate degrees. 
Although the msuority of their daily verbal 
exchanges took place in Frendi, all the subjects 
had some experience with other languages, as is 
typical of highly educated Europeans. 

B. Test materials. The materials for the 
experiment consisted of a set of 30 Frendi words, 
6 two-syllable, 12 three-syllable and 6 eadi of 
four- and five-^syllable words. (See the Appendix 
for a complete list of the stimuli.) As stress in 
Frendi is on final syllables, and all of the words 
were produced in isolation, all of the two-, three-, 
four-, and five^syllable words in Frendi were 
stressed on their final syllable. Each word was 
typed on the center of a 3 x 5 card. The cards were 
presented in the same random order to all 
subjects. 

C. Procedure. Recordings were made in a 
soundproof booth using a Sony tape recorder 
(model TC-510-Z) and a Sennheiser microphone 
(model MD 441-V). The subjects read the word 
typed on the card out loud and then reproduced 
what they had just said by substituting the 
syllable /ma/ for every syllable of the original, 
while preserving both its timing and the melodic 
contour. They were asked to be careful to use the 
syllable /ma/ in all cases and to repeat a stimulus 
item and its reiterant version, if they felt th^ had 
n:iade an error. 

Z>. Equipment and measurement methods. The 
30 Frendi words and their reiterant versions were 
low-pass filtered at 4.9 kHz, digitized at 10 kHz, 
and stored on disk, using Haskins Laboratories' 
Vax 11-780 computer. All durational measure 



ments were made by the author on the reiterant 
speedi using large-scale waveform displays, with 
a resolution of 0.1 ms. Differences in amplitude 
between the consonant and the vowel, as well as 
differences in the appearance of the waveforms 
associated with fm/ (the nasal murmur) and /a/, 
made segmentation relatively easy. This was par- 
ticularly true for reiterant productions by French 
speakers. It was very easy in almost all cases to 
segment the An/ and the /a/ because French /zn/ 
and /a/ are kept quite distinct, whereas English 
oral vowels in a nasal environment often show 
some nasalization (Clumeek, 1975). When there 
was a question about the location of a particular 
boundary, it was resolved throu^ listening to the 
segments in question. The most common segmen- 
tation difficulty arose in determining the location 
of the end of the word. A consistently conservative 
criterion was aiq[>lied, sudi that the termination of 
periodidty was used to marii the end point This 
exduded breathy releases, but seemed best for 
coosist«nt comparisons across speakers. 

In order to test the reliability of the duration 
measurements, a random sample of 12 French 
reiterant utterances containing 82 separate 
measurements were measured a second time by 
the author. Absolute duration measurement 
differences were within 4 ms of the original on the 
avwage overall and within 9 ms on the average on 
the 12 final vowel measurements. 

Not all individuals are equally adept at 
producing reiterant speedi that faithfully mimics 
the prosodic characteristics of the original 
utterances. To construct accurate timing models, 
we must require that the reiterant utterances 
chosen fcT analysis come from subjects who have 
demonstrated that they are capable of 
neutralizing inherent segmental length 
differences. That is, the subject must produce 
reiterant syllables of the same length, all other 
things being equal, for both original syllables that 
are inherently long and for ones that are 
inherently short. Reiterant speech studies 
typically use specially constructed sentences that 
are rhythmically matdied, based on their stress 
patterns, although one sentence of each pair 
contains words with inherently long syllables and 
one sentence contains words with inherently short 
syllables. Thus, the sentences in each pair are 
rhythmically the same, with the same number of 
syllables and the same locations for stressed 
syllables, but the individual syllables vary in 
length. Subjects should produce essentially 
identical reiterant productions for both sentences 



ERIC 



71 



in a iet» if» in fact, they are neutrilizing intarinaic 
differences in the durations of ii^dividual 
sefments. 

In the present stody, each of the two*, three-, 
four- and five-syllable word-length types had syl- 
lables composed of segments of inherently differ- 
ent lengths, llitts, instead of using a sentence- 
length test» measures of suloects' duration mea- 
g\irement variability in producixig word types were 
used as an indication of their ability to neutrslixe 
inherent segmental length differences. Each 
reduplicative version €i a particular word of a 
given length was considered a token of tiiat word- 
length tyj^ The standard deviations for compa- 
rable measurements, e.g., first syllable length, 
were calculated across tokens for eadi subject for 
each word-length type and averaged. Separate 
values were calculated for each of the four word- 
length types because it is generally more difficult 
to produce good reiterant versions for longer ut- 
terances. Finally, an overall mean (measure A) 
and a standard deviation (measure B) of each sub- 
ject's mean standard deviations for the four 
French word-length ^ypes were calculated, ^e 
overall group mean was 25 ms for measure A and 
20 ms for measure B. Subjects were rank ordered 
on both measures, and three subjects, one female 
and two males, showed means and standard devi- 



ations that were consistently longer than the 
other sulQects (35 ms for measure A and 33 ms for 
measure B for the group of three). Thty had also 
produced more errors between them (16) than the 
other seven aubiects combined. Their data were 
excluded from the construction of the French 
baseline measures for timing* For the remaining 
seven subjects, the mean for measure A was 19 ms 
witb a mean 14 ms for measure B. 

E. RuuliM. There were only seven errors made 
across the seven subjects (3%), most of which 
involved the addition or deletion of a syllable, 
usually on words of four or five syllables. All 
errors were excluded firom tbe omstruction of the 
baseline measures for timing. There were also two 
in ^MntM of missing data (1%)* 

Figure 1 shows the mean durational 
measurements for the syllables of the reiterant 
versions rfeach of the four word-types in terms of 
the mean durational measurements of the 
consonants (/m/) and vowels (/a/) of each syllable. 
The mean duration of An/ in noofinal syllables was 
83ms,of/ai/infinalsyIlableewas 103 ms, of /a/ in 
nonfinal syllables was 93 ms and of final /a/ was 
171 ms. Nonfinal syllables averaged 175 ms in 
length, whereas final syllables measured 274 ms 
on the average, an increase of almost 100 ms or a 
final/nonfinal ratio of 1.6.^ 




Figure 1. Cimsonsnt and vow«l danitiom, as a fuBCftion of weid Isfigthr sylisble posiaon, ssid stows, for reittrant 
productions of Fftnch weids 9pokm in isolation by native speskra of Fiendu (Numbeis indicsit syllsbk position, S 
indicalee sMsssd syUaUss, and W indicate uns t rts s sd tyllablss). 
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This final-iyllable lengthening was found to be 
significant in the results of a two-way analysis of 
variance comparing the ralgects' mean ncmfinal 
and final syllable lengths for the four word-length 
types IF(1,6)=130.19, p < 0.00001. There were no 
word-length ^ype and no word-length ^rpe by 
syllable position interactions. Analyses com p a ri ng 
suloects" mean nonfinal syllable lengths for each of 
the four word-length types were also not 
significant.^ A separate two-way analysis of 
variance to explore segment length in final and 
nonfinal syllables again showed a highly 
significant effect of syllabie position [FXlt6>«106.8, 
p < 0.0000]. There was also a significant effect of 
segment ^rpe [F(l,6)s46.01, p <. 0005], and a 
syllable by segment type interaction [FT1»6)b60JS6, 
p < 0.0002]. Post hoc tests (Newman-Keuls) 
revealed that final /a/ was significantly different 
from nonfinal /a/ and from final and nonfinal /m/ 
and that final fm/ was significantly different from 
nonfinal /m/, all at the p < 0.06 level or better. 
Nonfinal /m/ and /a/ were not significantly 
different from one another. 

Table 1 shows the mean length of eadi of the 
word types and the ratio of the mean length of the 
consonant to that of the vowel in eadi syllable. 
The overall mean C/V ratio was .9 for nonfinal 
syllables and the CAT ratio was .6 for final 
syllables. In addition, Table 1 presents the ratios 
of the mean syllable length to the word as a whole. 



Table 1. Mean word lengths (in ms) and C/V and 
CWUngth ratios in reiterant speech productions of 
French words by native speakers of French. 

Wofd Length in Syllables 
Two Tbnt Four Five 
Mean Word Length 448.2 61L6 776^^ 1027^ 

Ratios 



c\/w\ 


.9 


.9 


.8 


.9 


C2/V2 


.6 


.8 


.9 


1.0 


C3/V3 




.6 


.9 


1.0 


C4/V4 






J 


.9 


CS/VS 










Rstioi 








/ 


CVl/L 


.4 


3 


,2 


.2 


CV2/L 


.6 


.3 


.2 


.2 


CV3/L 




.4 


.2 


.2 


CV4A. 






.4 


.2 


CV5/L 








.3 



F. Discussion 

The results of this experiment showed fairly 
good agreement with the published data on 
Frendi^ especially with respect to Frendi syllable 
duration ratios. The segment measurements will 
be considered first and then the syllable 
measurements. 

The duration measurements for French nonfinal 
/m/ and /a/ and for final /m/ tended to be roughly 
20 ms longer than the durations found for the 
same segments by other researchers (Di Cristo, 
1980; O'Shaui^essy, 1984; Smith, 1977). This 
discrepancy is most likely due to the fact that the 
subjects in the present experiment spoke at a 
slower rate in producing reiterant speech than the 
siaiitieGts in the other studies, who read French 
texts. The measurement for utterance-final /a/ was 
roughly 10 ms longer than that of O'Shaughnessy 
(1984). Hie smaller discrepancy in final position is 
probably due to the conservative segmentation 
criterion adopted in the present study. Thus, 
given the segment values of the present study, the 
nasal consonant /m/ accounted for 47% of the 
duration of nonfinal syllables, whereas for final 
qrllables, it accounted for 38%. 

In general, nonfinal syllables were remarkably 
close in duration (see Figure 1). The present data 
did not show an initial syllable shortening as 
compared to medial syllables, which disagrees 
with Crompton's (1980) finding of decreased 
length for initial syllables. In fact, another 
researcher (Vaissi&re» 1983) has found growing 
evidence in Frendi of a tendency to stress word 
initial syllables, and presumably to lengthen 
them. Indeed, one of the subjects showed a regular 
lengthening of initial syllables. Crompton (1980) 
also found evidence for prenuclear lengthening, or 
lengthening of a syllable just prior to a nuclear 
stress. An analogous penultimate syllable 
lengthening has been described by Smith (1977) 
as characteristic of Parisian French (although 
only one of Crompton's four subjects was from 
Paris, while the other three came from Brittany). 
The present pooled data show no overall effect of 
penultimate syllable lengthening, altl'AOUgh data 
from two of the speakers do show such, an effect 

The ratio of final syllable to non-tinal syllable 
length in the present data was 1.6, which agrees 
exactly with Parmenter and Blanc's measure of 
1.6 (1933), with BenguereVs (1971) measure of 1.6, 
and with Allen's (1983) finding of an overall ratio 
of 1.6 when he compared the median lengths of 
final to penultimate vowels in French children's 
productions of Frendi words. It does not match 
Delattre^s (1966) measure of 1.8, perhaps because 
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of differencei in the criteria used for meaturizig 
final qrllaUe leogths. 

In lummaiy, our French timing data based on 
reiterant speech productions of French words 
spoken in isolation showed generally consistent 
syllable durations for nonfinal syllables and a 
ratio of finalMonfinal syllables of L6. Individual 
sulgects showed some sU^t leogthening of initial 
or penultimate syllables, but no consistent 
evidence for any shortening effects* Insofiu* as 
intrasyllabic tiiaing is concerned, in nonfinal 
syllables, the nasal accounted for 47% of the 
duration, and in final syQeUes, it accounted for 
38%. How well then do non-native speekors of 
French match these characteristic duration 
patterns when they produce reiterant speech 
versions of Frendi words? 

IL EXPERIMENT 2 

A. SulJectM. Ten suigects, five male and five 
female, participated in the study. All of the 
subjects except for one have advanced graduate 
degrees. All are native speakers of English, 
currently living in the Boston area, who have 
studied standard Frendi. Four of the sulgects (two 
men and two women, including the author) teach 
French at the university level. One subject 
learned French from his Frendi wife, whom he 
met after graduate sdiool. The other suiqects all 
had some formal training in French; seven 
subjects began the study of French in hi|^ school 
and the remaining two in junior high sdbooL The 
four teachers of Frendi and the other subjects, 
with the exception of the subject who learned 
French at home, averaged over two years of high 
school French Ihe four French teadiers, however, 
studied French for four years in college, as 
compared to an average of slightly over 1 1/2 years 
in college for the others. The four French teachers 
also completed postgraduate training in French 
and had traveled more extensively in French- 
speaking countries than had the other subjects. 

B. Teit materials The same French deck of 3 x 5 
cards used in the previous experiment was used in 
this second study. An additional ded^ consisting of 
the English cognates of the Frendi words was also 
used. The 30 English words consisted of two, 
three, four or five syllables. There were ten 
possible stress patterns re pr e sented . For words of 
two syllables, both initial and final primaiy stress 
patterns occurred (Bocred and dkgrt,) For words 
of three syllables, initial, medial and final primary 
stress patterns occurred icompllinuntp inUruetivt, 
and engineer). For words of four qrllables, three of 
the four possible primary stress patterns occurred 



icommentojy, economy, and e:gHmtion). For words 
of five ^llables, two possible patterns ocoirred 
(electricity and communication). Ihere were three 
different words representing eadi of the syllable 
and stress types.^ AlUiough in general most of the 
cognstes had the same number of syllables in the 
two languages, there were three items for whidi 
Uie syllable count differed. (See the Appendix for a 
complete list of the stimuli used). 

C. Procedure. Sulqecto first filled out a short 
questionnaire about their years of experience with 
Frendi and were then recorded in a quiet room, 
onto a Teac ts^e reoorder (model X-7MKID using 
a Rtr^**^ dynamic mier^dione (model 33«984A). 
The rest of the procedure was the same as in the 
previous e^miment, except that subjects read 
and produced reiterant versions the words of the 
English dedc first 

D. BqyipmmtandmeaMurem^nJtmcthode. All 30 
Frendx and 30 F"g^^ words and their reiterant 
versions were low-pass filtered at 4.9 kHz, 

at 10kHz» and stored on disk on Haskins 
Laboratories^ Vax 11/730. The same criteria used 
in the previous ejqieriment were used here to 
determine the consonant and vowel boundaries 
and the end of Qie reiterant s p eedi u t te r an c e. 

A random sample of fourteen reiterant 
productions of English words containing 102 
s^Murate measurements were measured a seomd 
time. ISie absolute duration measurements were 
within 4 ms of the original measures on the 
average overall, and within 9 ms on the average 
for the fourtesn final vowel measurenMnts. 

Ihe errors from both sets of reiterant produc- 
tions will be discussed first. The data from 
Experiment 2 will then be presented as a set of 
baseBne measures for consonant, vowel, and syl- 
lable timing for English words of various lengths 
and stress patterns based on die productions of 
the most consistent reiterant speakers. Third, the 
English speakers* reiterant versions of the French 
words will be examined for patterns of intra- and 
intersyllabic timing. Finally, the durations of the 
productions of the Frendi native speakers will be 
statistically compared to those of the non-native 
speakers, brd^en into two groups, the relatively 
tt^rienced teadiers of Frendi and the other, less 
experienoed group of French learners. 

As with the Frendi subjects, measures of the 
American subjects' duration measurement vari- 
ability in producing word types were used as an 
indication of their ability to neutralise inherent 
segmental length differences. Eadi reduplicative 
version of a particular word of a given length and 
stress pattern was considered a token of that 
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word-lengtfa/streiis-pattem type. The standmrd de- 
viations for comparable measurements, e^g-, first 
syllable length, were calculated across tokens for 
each subject for each of the ten word- 
length/stress-pattem types and averaged. 
Separate values were calculated for each of the 
ten word*length/stress-patt«m ^rpes because it is 
generally more difficult to produce good reiterant 
productions for longer utterances and because 
variable word stress in English affects the dura- 
tion of syllables in comparable positions. Finally, 
an overall mean (measure A) and a standard devi- 
ation (measure B) of eadi sul:9ectfs mean standard 
deviations for the ten word-length/stress-pattem 
types were calculated. For the English words, the 
group mean on measure A was 18 ms with a group 
mean on measure B of 17 ms. When the subjects 
were rank ordered on these two measures, two 
subjects, one male and one female, showed the 
highest scores on both measures (for measure A, 
their mean was 26 ms, with a mean of 24 ms for 
measure B). The remaining eight subjects showed 
a group mean of 17 ms on measure A and 15 ms 
on measure B. In constructing the baseline mea- 
sxxres for timing for the English words, only the 
data from the eight most consistent subjects were 
included. 

Results 

The American subjects made relatively few 
errors in their reiterant versions of the English 
words. The twelve errors across the eight most 
consistent sul^ects gave an error rate of 5%, ¥nth 
most errors due to a subject's producing an 
incorrect nimiber of syllables for one of the longer 
words or to a subject's clearly stressing the wrong 
syllable in the reiterant production. There were 
only two missing tokens (.8%). The American 
subjects made many more errors in their reiterant 
versions of the French words. There were twenty- 
nine such errors (12%) across the eight subjects. 
Twenty-four of those errors (83% of the total), 
were words ending in '^on* or containing the vowel 
sequence as in ^sociiti," which the French 
count as a single syllable, but which many of the 
Americans counted as two. There was only one 
missing token (.4%). 

Figure 2 presents the averaged durational 
measurements of the eight American speakers for 
each of the ten word types as a function of the 
consonants (/m/) and vowels (/a/). For initial 
stressed syllables,^ hn/ averaged 56 ms and /a/ 92 
ms, for medial stressed syllables, /mf averaged 79 
ms and /a/ 108 ms, for final stressed syllables. An/ 
averaged 82 ms and /a/ 255 ms. For \mstressed 



syllables, /m/ averaged 45 ms and /a/ 70 ms in 
initial syllables, An/ was 65 ms and /a/ was 76 ms 
in medial syllables, and An/ was 79 ms and /a/ was 
155 ms in final syllables. T^e mean duration of 
syllables bearing primary stress^ were 160 ms in 
initial position, 190 ms medially, and 336 ms 
finally. Syllables with secondary stress averaged 
137 ms initially and 168 medially. Syllables that 
were not stressed averaged 113 ms initially, 138 
ms mediaUy and 233 ms finally. 

Table 2 ^ows the overall mean length for each 
word type, the consonant/vowel ratios for each 
syllable and the ratios of each of the individual 
^rllables to the lengUi of the word. 

Figure 3 shows the mean durational measure- 
ments for the reiterant versions of the syllables of 
each of the four French word-l«igth types, as pro- 
duced by the native speakers of English, in terms 
of consonants (An/) and vowels (/a/). The mean du- 
ration of An/ in nonfinal syllables was 73 ms, of /m/ 
in final syllables was 95 ms, of /a/ in nonfinal syl- 
lables was 85 ms, and of /a/ in final syllables was 
235 ms. Nonfinal lyllables thus averaged 157 ms, 
whereas final syllables averaged 330 ms. The dif- 
ference in syllable length averaged over 170 ms 
and produced a finalAsionfinal ratio of 2.1. 

The results of a two-way analysis of variance 
comparing the subjects' mean nonfinal and final 
syllable lengtlu for the four word-length types 
showed a highly significant effect of syllable posi- 
tion [F(l,9)sl82.22, p < 0.0000], but no word- 
length type and no word-length type by syllable 
position interaction. Separate analyses comparing 
sul]!)ects' mean nonfinal syllable lengths for each of 
the four word-length types were also not 
significant.^ 

Table 3 shows the mean length of each of the 
word-length types and the ratio of the mean 
length of the consonant to that of the vowel in 
each syllable. The overall mean C/V ratio was .9 
f >r nonfinal syllables, which was comparable to 
ih&t cf ^e Frendi subjects, but the overall mean 
C/V was .45 for final syllables, which was different 
from that of the Frendi subjects. 

In order to test how well the American subjects 
conformed to the French baseline measures for 
timing for nonfinal and final syllables in their 
reiterant productions of French words, their 
timing measures were subjected to an analysis of 
variance with one between group factor with three 
levels (native French versus teachers of French 
versus English speakers) and two within group 
factors (syllable position [nonfinal versus final] 
and segment duration [consonant versus vowel 
length]). 



75 




■ V 

■ c 



isaw 1«»2s isSNrSw 1«2«aw 1w2M9s 

TwcH md IhiM^yflabto words 

400-1 



300- 




Four- and flVMyllabto wwds 



Figure 2. Comoiunt and tow«1 duntiom/ m a fynction ef woid kfifitiv tfUAU portion, and sticM^ for rt itciant 
imitationa of Englkfa woida spokan in isolaliiNn native apeakan of EngUdb. (Nnban indicate ayUabk podtion, S 
indicatM atiaaMd lEyllablaa, and W indicaket umtmMd lyUablae or tfaoM baadng aacondaiy atrMt). 

Tabic 2. Mem word lengths (uims)<mdC/V and CV/Lengtk ratios in reUerant speech productions of English words 
by native speakers of English . 



Word LMfth in SylUbkt 



Strew Type 
MeaaWoxxi Length 


Two 

1 2 
408.0 457.1 


3 

552.4 


Three 
4 

5423 


5 

624.4 


6 

663.0 


Four 
7 

6513 


8 

7033 


Hve 

9 

825.7 


10 
861J 


Ratios 




















ClATl 


5 .6 


.6 


.6 


.7 


.7 


.6 


.6 


.6 


.7 


C2P/2 


5 3 


.9 


.7 


.9 


.8 


.8 


.9 


.8 


.9 






S 


.6 


.4 


.9 


.8 


.7 


.8 


.8 


CAfVA 










.8 


3 




.8 


.7 


csrvs 
















.6 
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CVl/L 


A 2 


3 


2 


.2 


.2 


2 


.2 


.1 


.1 


CV2/L 


^ .8 


3 


A 


.2 


.2 


3 


.2 


2 


2 


CV3/L 




S 


A 


S 


3 


.2 


3 


2 


2 


CV4/L 










3 


.4 


3 


2 


2 


CV5/L 
















3 


3 



Tokmi of types: l>coiinter, 2*«oBtrol; 3«cioiapiiinmt; 4>GoncLvnoa; S-Mfineo; d-coouMOUiy; Txcoaomy; 8— x pcxi t ion; 



9a«Uiticity:10BQ0iiiiBiiiicatiaa. 
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Figure 3. Consonant «nd vowel duiationft, m a fiuidion of woid Itng^v •yllai>lo position, and strtss, for rcitcrant 
French wonia apolctn in iaolati<»i by non-nativ« sptakna* (Nionbcs indlcata syllable position, S indicates stressed 
syllables, and W indicates uns tr essed syllablss). 



Table 3« Mean word lengths (in ms) and C/V and 
CV/Length ratios in reiterant speech productions of 
French words by native speakers of English. 



Word Length tc Syllables 





Two 


Three 


Four 


Five 


Mean Word Length 


5005 


6S6.1 


786.1 


943.6 


Ratios 










ClA^l 


.7 


.8 


1.0 


1.0 


C2/V2 


.4 


.9 


1.0 


.9 


C3/V3 




.4 


.8 


1.0 


CA/VA 






5 


.9 
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CVUL 


3 


2 


2 


2 


CV2/L 


.7 


3 


2 


2 


CV3/L 




S 


2 


2 


CV4/L 






.4 


2 


CV5/L 








2 



Although there was no significant main effect of 
group, there was a significant effect of syllable po- 



sition t?Xl,17M17J7,p < 0.0000] and of segment 
duration [1^147)=121.42,p < 0.0000], and both of 
these effects interacted significantly with the 
group factor [F(2,17)=15.41, p < 0.0003], in the 
case of syllable position, and [F(2,17)s8.28, p < 
.0032], in the case of consonant versus vowel 
length. There was also a significant two-way 
interaction of syllable position and segment 
duration [F(l,17)=145.20, p < 0.0000] that also 
interacted significantly with the group factor 
I?X2,17)=:10.88, p < 0.001]. Figure 4 shows the 
pattern of results for the three groups. 

An exploration of the group interactions with 
syllable position and consonant versus vowel 
revealed that the source of the interactions was 
the differences in final syllable length among the 
three groups, in particular due to differences in 
the vowel length, as can be seen in Figure 4. A 
separate analysis of variance conducted on final 
syllable vowel length was significant [FX2,17} 
=7.65, p < .0044]. Post hoc (Newman-Keuls) tests 
revealed that in terms of final vowel length, the 
productions of the native speakers of French and 
the French teachers did not differ from one 
another but the productions of both groups 
differed from those of the other native English 
speakers (p<.06). 
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F. Discussion 

The American subjects' productionB of the 
English segment and syllable durations will first 
be discussed, followed by an examination of the 
ways in whidi their reiterant productions of the 
French words deviate from the Frendi baseline 
measxires. Finally the possible effects of English 
timing patterns on the French productions will be 
considered. 

In the English reiterant speech, the nasal mur- 
mur accounted for 38% of the syllable in stressed 
initial syllables, 42% in stressed medial syllables 
and 24% in stressed final syllables* For unstressed 
syllables the percentages were 39% initially, 45% 
medially and 34% finally. These percentages 
clearly differ from those found in French in 
Experiment 1, which suggests that the intrasyl- 
labic timing is not the same in the two languages. 

There was also clearly an effect of utterance- 
final lengthening carried largely by the vowel in 
the English data. For stressed syllables, 
lengthening for final vowels was roui^y 160 ms 
and for unstressed syllables it was rou^y 75 ms* 
These durational lengthenings are comparable to 
those found by Oiler (1973).7 

Insofar as the syllable measurements are 
concerned, the present data showed dear effects 
both of stress and of utterance-final Iragthaning. 
There also appeared to be increments due to 
secondary stress, although Nakatani at al* found 
only marginal increases in length for such 
syllables and only for some speakers. Hie ratio of 



finalAionfinal syllables was 1.7, which is greater 
than the 1.5 found by Dalattre (1966), but which 
may be due to the unusually short initial syllables 
found in this study. Indeed, if initial syllables are 
^^infi^iiatjMl from consideration, the ratio becomes 
1.6, whidi is closer to Delattre's measure. The 
ratio of accented to unaccentiMi syllables was 1.43 
in initial syllables, 1.38 in medial syllables and 
L44 in final syllables, lliese ratios, which do not 
indude the somewhat problematic syllables that 
bear secondary stress, correspond fairly well to 
Hoequist's measure of 1*45, although they are 
lower than the measure given by Delattre (1966) 
of L7. Hoequist^s (1983) suggestion that Delattre's 
higher ratio is due to the inclusion in the 
unstressed group of very short /a/ syllables, which 
are generally not found in reiterant speech, seems 
quite reasonable. 

As can be seen in Figure 4, for the reiterant 
versions of the French words, there was little 
difference in the consonant and vowel lengths in 
nonfinal syllables for the three groups. Thus, the 
percentage represented by the nasd in nonfinal 
syllables was 47% for the native speakers of 
Frttodi, 49% for the American teadiers of French, 
and 44% for the less eqmienced French speakers. 
There was also little difference in the mean length 
of /m/ in final syllables for the three groups of 
subjects. The striking difference in the reiterant 
productions of the three groups occurs in the 
length of uttarance-final /a/ whidi was 171 ms for 
the French natives, 199 ms for the French 
teadiers, and 260 ms for the less experienced 
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group* Hius, the nasal consonant accounts for 38% 
of the final syllable for Frendi natives, 33% for 
French teachers, and only 26% for the less 
e3q>eTienced group. Intrasyllafaic tuning appears to 
be more native-like in nonfinal than in final 
syllables. The ratio of final to nonfinal syllables 
was 1.6 for the Frendi natives, L9 for the French 
teachers, and 2J2 for the others. Although the 
reiterant productions of the American teadiera of 
French were not significantly different from those 
of the Frendi natives, in almost mil cases, the 
teachers' productions, while close to those <tf the 
French natives, fall between that group and the 
other group of native speakers of English. 

Sxirprisingly, the Americans had a durational 
pattern in their reiterant versions of English 
words that turned out to be very close to the 
French timing pattern. Thus, the average 
duration of the first syllable in two syllable words 
with stress on the first syllable (see Figure 2) was 
173 ms while the £nal syllable was 236 ms on the 
average, which is comparable to the French 
natives' 176 ms average length for nonfinal 
syllables and 274 ms average length for final 
syllables. Yet many of the Americans who were 
less experienced in French seemed to match the 
durational pattern of the final syllable of French 
words uttered in isolation (353 ms) by pattemizig 
it after the duration of their own stressed syllables 
in final position (336 ms) whereas the teadiers of 
French achieved a closer matdi to the French 
baseline measure (296 ms). 

Insofar as the nonfinal syllables are concerned, 
all the Americans showed that they can generally 
produce syllables of quite equal length (see Figure 
3), and there w^ no indication in their reiterant 
versions of French of the systematic initial 
syllable shortening that was found with the same 
subjects in the English reiterant productions, 
although some individual subjects continued to 
show such a pattern. 

Thus, the American teachers of French produced 
reiterant timing patterns that, while not identical 
to those of the native Fren^ speakers, did not 
differ significantly from them. On the other hand, 
the American teachers of French and the French 
natives both produced final vowel timing patterns 
that were significantly different from those of the 
other Americans. 

G. General Discussion 

There is a growing body of acoustic-phonetic 
literature that suggests that the non-native 
productions of late second language learners are 
influenced, sometimes in subtle ways, by their 



native language speech patterns (see Flege, 1986, 
for a review). Most of the researdi has focused on 
the analysis of the phonetic characteristics of 
bilingual speedi. Thus the influence of native 
language phonetic habits has been demonstrated 
for voice onset time (VOT) in stop consonants for 
English/French bilinguals (Flege & Hillenbrand, 
1984) and for Arabic/English bilinguals (Flege & 
Port, 1981), because bilinguals show a range of 
VOT values when speaking their second language 
that are intermediate between the values 
produced by monolingual native speakers of the 
two languages. Native language influences have 
also been shown for English vowel durations that 
depend on the voicing of the final consonant, 
because French/English bilinguals showed vowel 
durations, when speaking English, that were 
closer to those of French monolinguals (which 
vary less with respect to the voicing of a oylleble- 
final consonant) than to those of English-speaking 
mrtnttAlingiiaU (Mack, 1982). 

A similar effect of the rhythmic pattern of the 
native language on the acquisition of the rhythmic 
patterns of English by native speakers of French 
has been found by Wenk (1985) who has described 
his subjects as passing through a transitional 
^teriangua^e* phase, diaracterized by features 
of both language systems. Intermediate-level 
speakers of French who were learning English 
i^parently mastered post-tonic reduced vowels (as 
in matter) before pre-tonic reduced vowels (as in 
Japan\ when their productions of such words was 
judged by native speakers of English. In the 
present study, native speakers of English who 
have studied French appear to master the 
relatively equal durations of nonfinal syllables in 
French before they master the appropriate French 
final syllable length, because both groups of 
American subjects produced essentially equal 
nonfinal reiterant syllables in French, but only the 
more eiqserienced group of American subjects, the 
teachers of French, also produced French-like final 
syllables. Flege (e.g., Flege, 1981; Flege 1987; 
Flege & Hillenbrand, 1984) has hypothesized that 
second language learners may acquire more rapid, 
accurate pronunciation of a sound that is totally 
foreign to their native repertoire, because they are 
unable to assimilate it to one of their native 
phonemes. Equally-timed nonfinal syllables are 
not typical. of English words, whereas final- 
syllable stress does occur. Perhaps native 
speakers of English who learn French are more 
successful in producing essentially equal nonfinal 
syllables in their reiterant versions of French than 
in producing the correct final-syllable lengthening, 
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because the former pattern is more forei^ to their 
native repertoire. 

Many have argued that language leamert tvho 
begin their study of a second language relatively 
lat« &il to master iuDy the phonetic details of that 
second language because of biological limitations 
imposed by a critical or sensitive period for speech 
acquisition (Lenneberg, 1967; Long, 1990; Oyama, 
1979; Scovel, 1988). The notion of a critical period 
for language acquisition is a strong one and 
describes a period that is genetically determined, 
clearly delimited, and not susceptible to the 
influence of the environment. The notion of a 
sensitive period for language acquisition, on the 
other hand, while still a maturational c^ect, is 
subject to greater variability, including a less 
clearly delimited time*frame. Althoui^ for some 
researchers in the field, the onset of adolescence 
(roughly twelve years of age) was seen as the point 
after which second language learners were likely 
to speak their non-native language with a notable 
foreign accent, others have pushed for acquisition 
of a foreign accent to six, at least for some 
individuals (see Long, 1990, for a review). Indeed, 
Long (1990) has written: 
Thus, widle some^^iat wesdcer than the claim for a 
criUcal period for first laagoage lesming, tbe d&im 
for a sensitive period for secood language 
acquisition is still a strong and imeiesting one. Tht 
maturational piocesses undedying it are heU to be 
universal. Hence, learners who t>egin a second 
language after its supposed closure (wfaidi will bere 
be daimed to be as eady as age 6 for phonology in 
many individuals and aroond 1 5 for morpbatogy and 
syntax), and who nevertheless attain native-like 
abUity in those areas, will falsify the hypothesis 
(p. 253). 

However, all of the native speakers of English in 
the present study were late learners of French 
(begixming in junior high school at the earliest)^ 
yet the more experienced group of learners 
(American teachers of French) produced timing 
patterns that were not significantly different from 
those of the rutive Frendi speakers. 

Two possible explanations for this pattern of 
results can be suggested. Either the acquisition of 
second-language rhythm patterns is exempt from 
the sensitive period constraint or factors such as 
length <^ exposure, training, language aptitude, or 
motivation may play an important role. Whereas 
there has been little empirical investigation of the 
first hypothesis, the role of experience and 
training has been supported by a number of 
studies. For example, Wenk (1985) found that his 
advanced French students of English, unlike those 



at the intermediate level, had mastered the vowel 
reduction patterns associated with English word 
stress. Similarly, Flege and Eefiing (1987) found 
that Dutch apenkers of English who migored in 
the suluect were judged to have significantly 
better pronunciation scores than Dutch students 
of English who studied to become engineers, 
althoui^ both groups' productions were judged to 
be significantly different from those of native 
English speakers. As in the present study, 
however, experience may have been confounded 
with aptitude. The English minors, like the 
univendtgr-level teadiers of Frendi in the present 
study, were more experienced second-language 
learners, but they also probably had greater 
aptitude for second-language learning. In fact, 
aptitude rather than es^rienoe may be the source 
of the performance of the group of French 
teachers. However, in either case, if good reasons 
for fx^wipt'^g the acquisition of second-language 
rhythm patterns from the sensitive period 
constraint are not found, then these results call 
into question the notion of a sensitive period as 
currently formulated. 

Future research needs to compare directly 
second-language segmental and rhythmic 
learning, to see if rfaythnric patterns are easier to 
acquire, and to determine the relative 
contribution of rhythmic and phonetic factors to 
the detection of non-native pronunciation. 
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FOOTNOTES 

*Joumalcft)^ Acoustical Society of America, 90r6;/3008^8 (1991). 

^ Also Wdlealey College. 

^All ratios r ep o i t ad in tiie paper arc to 1. 

^Results of thasB analyses of variance were essentially the same, 
even whan all ten original subjects were included. The only 
significant effect was tet of syllable position [F(l,9)«121.16, p< 
iXXX)]. None of tiw odier effects were significant 

^In ttw caae of five-syUaUe words^ ttiere were actually four words 
representing one of the ftve-syUable word types and two words 
reprsaentxng the othir.) 

^For co mp ttrrfyility with CHler (1973) secondary stress syllables 
wcra grouped with unstressed syllables. 

^H^e syllables were here divided into those with primary, 
secondary and no itrsas for comparability with Nakatani et al. 
(1981). The two initial tylliblea of the second set of five syllable 
words had complementary stress patterns (one of the words had 
a secondary stress v^wre the other had no stress and vice versa), 
ao the averaged durations of those syllables were excluded from 
these calculations. 

^The results of this analysis and all sul>sequent analyses include 
all of ttte originBl subjects from both groups. Similar analyses 
including only the subjects who produced the most consistent 
reiterant speech produced essentially the same results. 

^However, the present data exhibit a consistent effect of initial 
syllable shortening (see Hgure 2), which disagrees with findings 
by CXler (1973), KUtt 0^76) and Nakatani et al. (19SI). The most 
likely explanation for this discrepancy is that the reiterant 
productions in tius study were produced as citation forms, 
rather than in a sentence frame. The present study used citation 
forms in order to reduce the number of syllables that subjects 
needed to g e n te mb e r for the reiterant production of individual 
words (but cf. Nakatani et al., 1981 for a different method). It 
may t)e the caae that the aentence frame gives extn pronunence 
to the word to l^e imitated and that such prominence results in 
the pattern of word-initial syllable length found in the other 
studies. 
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APPENDIX 



Froach Words 



Eogliih Words (StreM Pattern) 



Two syllables 



Three syllables 



Four syllables 



Five syllables 



COTTiptOIT 

sacr6 
progris 

coQtrfile 
surprise 
degri 

compliment 

instrument 

solitude 

ingteieur 

indiscret 

japonais 

conclusion 

instructif 

solution 

commentaire 

I^endaire 

80dit6 



t^l^vision 

ioonomie 

publicity 

exposition 

population 

satisfaction 

autoznatiquemen 

ilastidti 

ilectridti 

possibilite 

communication 

civilisation 



counter (SW) 
sacred (SW) 
progress (SW) 
control (WS) 
surprise (WS) 
dscree(WS) 

compliment (SWW) 
instrume nt (SW W) 
solitude (SWW) 
engineer (WWS) 
indiscrete (WWS) 
j^)ane8e(WWS) 
conclusion (WSW) 
instructive (WSW) 
solution (WSW) 



commentary (SWWW) 
legendary (SWWW ) 
television (SWWW) 
society (WS WW) 
economy (WSWW) 
publicity (WSWW) 
exposition (WWSW) 
population (WWSW) 
satisfaction (WWSW) 

automatically (WWSW W) 
elasticity (WWSWW) 
electricity (WWSWW) 
possibiUty (WWSWW) 
communication (WWWSW) 
civilization (WWWSW) 



(Ssprimary stress, W=secopdaiy stress or no stress) 
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Syllable-internal Structure and the Sonority Hierarchy: 
Differential Evidence from Lexical Decision/ 
Naming/ and Reading"^ 

Andrea Levitt,t Alice F. Healy,tt and David W. Fendridittt 



Treiman (e.g., 1983) and others have argued that spoken syllables are best characterized, 
not as linear strings of phonemes, but as hierarchically organised units consisting of an 
onset (initial consonant or consonant cluster) and a rime (the vowel and any following 
consonants) and that the rime is further divided into a peak or nucleus (the vowel) and a 
coda (the final consonants). It has also been aigued that the sonority (or vowel-likeness) of 
the consonant closest to the peak, which is a fimction of its phonetic class, may have an 
effect on the strength of boundaries detennined by the hierarchical division of the syllable 
(e.g., Treiman, 1984). We examined the evidence for syllable-internal structure and for 
sonority in two experiments tiiat employed visually presented stimuli and lexical decision, 
naming, and reading tasks. Our results provide support for the breakdown of the rime into 
a peak and a coda and for an effect of the sonority of the poetvoealic consonant on that 
break. This pattern occurred only in cur lexical decision tasks, so the effect is assumed to 
be postlexical. We did not find an effect of the onset^rime boundary, perhaps because of an 
unanticipated effect of word frequen^. Our results are discussed in terms of phonolc^cal 
coding in short-tMm memory. 



Recent psycholinguistic evidence has suggested 
that English syllables are organized hierarchi- 
cally, divided first into an onset (consisting of the 
initial consonant or consonant cluster) and a rime 
(consisting of the following vowel and any addi- 
tional consonants), with the rime further divided 
into a peak or nucleus (consisting of the vowel) 
and a coda (consisting of the remaining conso- 
nants).^ For example, Cooper, Whalen, and Fowler 
(1986) have shown that the P-center (moment of 
perceptual occurrence) of a syllable depends on 
the duration, though not the number, of syllable 
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initial consonants (the onset) and, in a later study, 
to a lesser extent on the rime (Cooper, Whalen, & 
Fowler, 1988). This division of the syllable into an 
onset and rime is particularly well supported by a 
number of studies by Treiman (1983, 1986), who 
tteiight subjects novel word games in which they 
were required to recombine components from 
pairs of nonsense syllables or words, and found 
that they were more likely to divide those 
syllables between the onset and rime than 
elsewhere in order to complete the tasks. More 
recently^ Treiman and Chafetz (1987) have 
demonstrated evidence for the onset/rimo break in 
printed words, using both an anagram and a 
lexical decision task. In the first case, they found 
that subjects were better able to recognize a word 
like twist when ii; was divided TW 1ST (at the 
onset/rime boundary) than when it was divided 
TWI ST (between the peak and the coda). In the 
second case, subjects responded more quickly in a 
lexical decision task when the test item contained 
slashes after the onset (CR//ISP) than after the 
vowel (CRI//SP). 
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Tbe evidence in support of dividing the rime into 
a nucleus and a coda is perhaps somewhat less 
compelling. Treiman (1983), using novel word 
games, found only weak support for the nu- 
cleus/coda division and suggested that the division 
might depend on the phonetic makeup of the final 
consonant cluster. Indeed, when she systemati- 
cally varied the sonority (or vowel-likeness) of the 
consonant following the vowel in VCC syllables 
(Treiman, 1984), she found that sutdects in a word 
game task tended to view liqmd consonants, 
which are quite vowel-like, as belonging to the nu- 
cleus or peak, obstruents, which are not at all 
vowel-like, as belonging to the final consonant 
cluster or coda, and nasals, which are intermedi- 
ate in terms of sonority, as showing an equal 
affinity to both the nucleus and the coda* Derwing, 
Nearey, and Dow (1987) obtained similar results. 
These findings are largely in agreement with the 
proposals of MacKay (1972) and Stemberger 
(1983) that liquids folbwing the vowel be assigned 
to the nucleus rather than the coda. The findings 
also agree with the sonority hierarchy proposed 
for syllables (e.g.. Hooper, 1976), i^ch suggests 
that syllable peaks are pe^ of sonority, that con- 
sonant classes vary with respect to their degree of 
sonority, or vowel-likeness, and that segments on 
either side of the peak show a decrease in sonority 
with respect to the peak. 

However, the evidence connecting the ease of 
the onset-nucleus break to the sonority of the 
prevocalic consonants has been less consistent. 
Treiman (1986) found that there was no effect of 
the phonetic category of the prevocalic consonant 
on the onset-rime division (suggesting that onsets 
consisting of more than one consonant remain 
cohesive), while Derwing et al. (1987) did find 
such an effect of the phonetic category of the 
prevocalic consonant. 

Most of the evidence for the hierardiical division 
of the syllable into an onset and rime, and possi- 
bly into a nucleus and coda, comes from studies 
that present stimuli auditorily and require sub- 
jects to focus closely uo the phonological structure 
of the stimuli in order to play novel word games or 
perform segment interchanges. The literature on 
reading is divided as to whether the phonological 
code of a visual stimulus is obligatorily accessed 
(see, e.g.. Van Orden (in press)) or whether it is 
accessed only under certain circumstances (e.g., 
McCusker, Hillinger, and Bias (1981)). One study 
that used visual stimuli and looked for evidenod of 
the hierardiical division of the syllable was done 
by Treiman and Chafetz (1987). As mentioned 
above, they required sul:uecU to perform either an 



anagram or a leidical decision task on visually pre- 
sented stimuli, however, they only compared 
sul]tjects' responses to stimuli with breaks between 
the onset and the rime with their responses to 
stimuli with break)» following the nucleus. They 
did not examine the effects of breaks within initial 
and final consonant clusters as compared to the 
two breaks m«ntioned above^ nor did they investi- 
gate, in this study, the effect of sonority on the 
strength of these divisions. As a result of her 
numerous studies, Treiman (1986) has suggested 
that the intrasyllabic organization of the syllable 
should be recognized in theories of speech 
perception and production as well as in theories of 
reading. 

Research that has compared the results of 
lexical decision and word naming tasks (e.g., 
Seidenberg, Waters, Sanders, & Langer, 1984) 
suggests that certain effects may be postlexical, 
i.e., a result of processing that occurs after lexical 
access. Ibua, such effects emerge only in lexical 
decision ard not in naming tasks, since naming 
typically takes less time and is thus believed to 
involve less postlexical processing. It is often 
assumed, however, that naming a visually 
presented word requires accessing its phonological 
code (e.g., Seidenberg, 1986). Silent reading of 
visually presented stimuli is another task that has 
been shown to be sensitive to semantic and 
phonological priming (McNamara & Healy, 1988), 
while also presumably requiring less postlexical 
processing. It would be of interest, therefore, to 
see whether evidence for the hierarchical 
structure of the syllable can be found in each of 
these three tasks. 

The present experiments are thus designed (a) 
to replicate Treiman's (1984) finding that the 
break between the nucleus and coda varies as a 
function of the phonetic class (liquid, nasal, or ob- 
struent) of the postvocalic consonant, with postvo- 
calic liquids showing the greatest cohesion to the 
nucleus and obstruents showing the least, (b) to 
test for a similar effect of the phonetic class of the 
prevocalic consonant on the break between the on- 
set and the rime, and (c) to determine whether 
any evidence for such breaks is pre- or postlexical 
in origin by comparing the results of lexical deci- 
sion tests with those of naming and reading. 

EXPERIMENT 1 
SubdecU responded orally to visually presented 
stimuli, including both words and nonwords, all of 
which were monosyllabic and five letters long. 
Each visually presented stimulus could be 
interrupted at one of six possible locations by an 
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atteritk. One group of subjects performed a lexical 
decision task while a second group named eadi of 
the items out loud. 

Method 

StimulL Two sets of test items were constructed, 
one to examine the effect of the composition of 
initial consonant clusters on the cohesion of the 
onset-rime boundary of the syllable and another to 
examine the effect of the composition of final clus- 
ters on the cohesion of the rime-internal nucleus- 
coda boundary. All test items were single sylla- 
bles, contained five letters, and, with the exception 
of some of the onset-rime test items, described 
below, all had a C1C2VC3C4 phonemic structure. 

In the case of the onset-rime test words, C2 was 
either a liquid (twelve items), a nasal (six items) 
or an obstruent (six items).^ There were twelve 
additional five-letter words with no initial 
consonant cluster, but with an initial single 
phoneme, e.g., /(/, which is normally represented 
by two letters, ^sh.^ Nine of these items had a 
C1VC2C3 phonemic structure, and three had a 
C1VC2 structure. All were five letters long. All 
words were also low frequent, with the mean 
frequency for the liquid items 7.3 (KuCera & 
Francis, 1967), for nasal items 9.8, for obstruent 
items 6.8, and for single-phoneme items 7.8. The 
corresponding onset-rime nonword test items were 
constructed by smtching the vowel and final 
consonants of one item with the vowel and final 
consonant of another item from the same series, so 
that two nonwords were created (e.g., craft and 
flint giving fiaft and crint). 

In the case of the nucleus-coda test words, there 
were twelve words each for which C3 was a liquid, 
a nasal, or an obstruent. The corresponding 
nucleus-coda nonword test items were constructed 
as above (e.g., blunt and swamp yielding Mwunt 
and blamp). The mean frequency for the liquid 
items was 9.8, for the nasal items 9.1, and for the 
obstruent items 9.8. 

Each word and nonword (see Appendix for the 
complete list) could appear vrith an asterisk in one 
of three positions. For the onset-rime test items, 
the asterisk could appear before the word 
(Position 1), after the first letter (Position 2), or 
between the second letter and the vowel (Position 
3), e.g., ♦CRAFT, C^RAFT or CR*APr. For the 
nucleus-coda test items, the asterisk could appear 
after the vowel (Position 4), after the third 
consonant (Position 5), or after the word (Position 
6), e.g., BLU*NT, BLUN*T, BLUNT*. Positions 1 
and 6 are control positions because the asterisk 
does not interrupt either the initial or the final 
consonant cluster. 



Three lists of 144 test items were prepared. 
Each word and nonword appeijred only once on 
ea(h list^ The order of presentation was pseudo- 
random with the following constraints: In every 
twelve items there was an equal number of onset- 
rime and nucleus-coda test words and nonwords 
and an equal number of asterisks at each of the 
six positions. For the nucleus-coda test items, in 
every group of twelve, there were two stimuli with 
a liqmd, nasal, or obstruent as the C3 phoneme. 
For the onset-rime items, in the same group of 
twelve, there were two stimuli with a liquid as the 
C2 phoneme, two stimuli with a single initial con- 
sonant, and either two stimuli with a nasal as the 
C2 phoneme or two stimuli with an obstruent as 
the C2 stimuli. The three lists differed only as to 
the location of the asterisks with each one of three 
possible asterisk locations occurring once across 
lists for every stimulus. 

Procedure. Subjects were told that strings of let- 
ters would appear on the computer screen in front 
of them. Subjects in the lexical decision condition 
were to say ^es* if the string was a word and ^o'^ 
if it was not Sulijects in the naming condition 
were to read the word or nonword out loud. A 
voice key was used to record subjects' response 
times. The experimenter first made siure that the 
key was responding properly to the level of the 
subject's voice, and the subject was instructed not 
to make inadvertent noises, as the key was quite 
sensitive. Subjects' responses were recorded on 
cassette tapes. The experimenter noted all errors 
in both conditions^ so that the responses to those 
items would be excluded from analysis. 

Subjects. Twenty-four Wellesley College 
undergraduates were paid for their participation 
in the experiment and were assigned to conditions 
by order of arrival, according to a fixed'rotation. 

Results 

The onset-rime and nucleus-coda words repre- 
sented different sets of words^ and therefore each 
set of items was analyzed separately. All response 
latencies were reciprocally converted to speeds for 
the analyses,^ but the resulting mean speeds were 
con. erted back to latencies for reporting in the ^ 
text and in the figures. Two sets of analyses were 
performed, one on the latencies for correct re- 
sponses and another on the error proportions. A 
response was considered an erxr in the naming 
task if a subject failed to respond or if the re- 
sponse was incorrect Items were not treated as a 
random effect because the stimuli were not ran- 
domly selected (Wike & Church, 1976). 

We need to obtain an effect of asterisk position 
in order to demonstrate syllable-internal structure 
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and an interaetion of uteritk position ^vith cluster 
composition in order to demonstrate an effect of 
the sonority hierarchy on syllable-internal 
cohesiveness* 

Onut-rime words. For the onset-rime test 
words, the an^yses induded one between-suigects 
frctor, response condition Qexical decision or 
naniing) and two within-subjects factors, onset 
composition (C2 either an obstruent, liquid, nasal, 
or the second grapheme of a single ph on em e ) and 
asterisk position [immediately preceding the word 
(Position 1) or following the first (Position 2) or 
second (Position 3) letter]* In these analyses we 
did not find the anticipated asterisk position effect 
nor the anticipated asterisk position by onset 
composition interaction. However, we did find 
some interesting effects of response condition and 
onset composition. 

As would be expected if, indeed, the lexical 
decision task requires additional postlexical 
processing, the u:fian latent for naming (673 ms) 
was faster than that for lexical decision (831 ms). 
Likewise, the error proportions were higher for 
lexical decision (.092) than for naming (.042). 
In the overall analysis of response latency for 
onset-rime words, there was a significant main 
effect of response condition Oexical decision vs. 
naming), F( 1,22)^12.37, p s.0022, MSex5.7176. 
The effect of response condition was also 
significant in the error analysis, <P(1,22>=8.36, p 
s:.0083,MSes.l824. 

Although the differences in frequency were 
small among the words comprising the different 
onset composition groups, the differences in fre- 
quent seem to have produced corresponding dif- 
ferences in both mean latencies and error propor- 
tions. Recall that the mean frequenpy for nasals 
was 9.8, for one-phoneme items 7.8, for liquids 7.3, 
and for obstruents 6.8. (Correspondingly, the mean 
latencies for nasal test items (722 ms),one- 
phoneme items (729 ms), liquids (734 ms), and ob- 
struents (796 ms) increased as the items became 
less frequent, as did the error proportions, with 
one small reversal (nasals.028, one- 
phonemes.067, liquid«.061, obstruents.lll). In 
the latenpy analysis^ there was a significant main 
effect of onset composition, J^3,66)»6.26, p«.0011, 
MSe c .2513, which was also significant in the 
eiTor analysis, 1^3,66)«4.33, p s.0077, MSes.0844. 

The effect of onset composition, which presum- 
ably reflected the frequent of the words for each 
of the onset composition types, was evident for the 
lexical decision task but not for the naming task. 
As was the case for the combined data, for the 
data from the lexical decision task, latencies in- 



creased as word frequency declined, and error 
proportions also increased, with one small rever- 
sal. The latencies and error proportions in the 
lexical decision task were 766 ms and .042 for 
nasals, 808 ms and .093 for one phoneme, 844 ms 
and .081 for liquids, and 923 ms and .153 for ob- 
struents. There was a significant interaction of 
onset composition with response condition, 
F(3,66)s3.61, ps.0174, MSex.l449» in the latency 
analysis, but not in the error analysis. In a sepa- 
rate planned analysis of the lexical decision laten- 
cies, done to investigate the source of this interac- 
tion, there was a significant effect of onset compo- 
sition, F(3,33)«6.85, p ^.0013, MSes.3114, which 
was marginally significant as well in an error 
analysis of the lexical decision task, F(3,33)s2.57, 
p S.0699, MSeK.0762. There were no significant ef- 
fects in either the latency or the error analysis of 
the T Y#™"g data, so this pattern seems limited to 
the lexical decision data (see Figure 1). 

Nudeuu-coda words. For the nudeus-coda test 
words, the analyses induded one between-subdects 
factor, response condition Oexical decision or 
naming) and two within-subjects factors, coda 
composition (C3 either an obstruent, liquid, or 
nasal) and asterisk position [immediately 
following the vowel (Position 4) or following C3 
(Position 5) or C4 (Position 6)1. 

As found for the onset-rime words and as ex- 
pected under the assumption that the lexical de- 
cision task requires more (postlexical) processing 
than does the naming task, the mean laten^r for 
lexical dedsion (830 ms) was longer than for 
naming (657 ms). Likewise, the mean error pro- 
portion for lexical dedsion was higher (*103) than 
for naming (.029). In the analysis of the nucleus- 
coda test items, there was a significant effect ov 
response condition, lexical dedsion vs. naming, 
F(l,22)sl5.26, p =.0010, MSe=:5.3887. This effect 
was also significant in the overall error analysis, 
F(l,22)=22.78, p«:.0O02, MSe«.3025. 

Just as there was an effect of onset composition 
for the onset-rime words, there was an effect of 
coda composition for the nucleus-coda words. 
However, the effect in this case was only evident 
for latendes, not errors, and did not reflect differ- 
ences in word frequent, whidi were minimal. The 
overall latency (combining data from the lexical 
decision and naming tasks) to C3 obstruents (708 
ms) was shorter than to nasals (723 ms), which 
were in turn shorter than to liquids (775 ms). (See 
Figure 2.) There was a significant main effect in 
the latency data of the coda composition, 
F(2,44)«16.66, p<.0001, MSe=.2914, not 
significant in the error analysis. 
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None of the remaining effects in the latency 
analysis were significant. Most crucially^ there 
was no effect of asterisk position or interaction of 
asterisk position and coda composition. There 
were, however, several other interesting effects in 
the error analysis, and the expected effect of 
asterisk position and the expected interaction of 
asterisk position and coda composition were 
evident for the lexical decision task, but not for 
the naming task. (See Figure 2.) Overall error 
proportions in Position 5 (.110) were higher than 
in Position 4 (.057) or in Position 6 (.031). 
Whereas the most errors occurred in Position 5 
overall and for all coda compositions with the 
lexical decision data, with the naming data the 
most errors occurred in Position 6 for the 
obstruents, in Position 5 for the nasals, and in 
Position 4 for the liquids. (See Figure 2.) The 
main effect of asterisk position was significant 
in the error analysis, F(2,44>sl0.42,p -.0004, 
MSe=.1161. There was also an asterisk position by 
response condition interaction, F(2,44)=10.57, 
p S.0004, MSe=.1178, and a three-way interaction 
of asterisk position by response condition by coda 
composition, F(4,88)=2.92, p =.0253, MSes.0362. 

As with the onset-rime words, planned analyses 
were conducted on the data with the nucleus-coda 
words separately for the lexical decision and 
nraning tasks. For the lexical decision latencies, as 
for the combined latencies, there was an effect of 
coda composition, with latencies to items in which 
C3 was an obstruent shorter (799 ms) than those 
in which C3 was a nasal (828 ms), which were in 
turn shorter than those in which C3 was a liquid 
(866 ms). There was a significant main effect of 
coda composition in the analysis of the lexical 
decision latencies, F(2,22)-5.44, p=r.0120, 
MSe=.0841. 

For the lexical decision errors, as for the 
combined errors, there was an effect of asterisk 
position, with the most errors (.19) occurring when 
the asterisk appeared between C3 and C4 in 
Position 5, next most (.08) when the asterisk 
appeared between the vowel and C3 in Position 4, 
and fewest (.04) when the asterisk appeared at the 
end of the word in Position 6. However, as 
anticipated, the effect of asterisk position 
depended on coda composition to some extent. As 
can be seen in Figure 2, obstruents and, to a lesser 
extent, nasals showed a dramatic increase in the 
proportion of errors when the asterisk intervened 
at Position 5 between C3 and C4, but the increase 
in errors for liquid items with an asterisk at 
Position 5 was less pronounced. There was a 
significant main effect of asterisk position in the 



error analysis of the lexical decision task, 
F(2,22)8l4.50, p s:.0002, MSes.2339. There was 
also a marginally significant interaction of 
asterisk position and coda composition, 
F(4,44)s2.44»p s.0601, MSes.0400. 

For the naming latencies, as for the lexical 
decision latencies and the combined latencies, 
responses to items with C3 as an obstruent were 
shorter (635 ms) than to those with a nasal (641 
ms), whidi in turn were shorter than to those with 
a liquid (700 ms). The main effect of coda 
composition was significant, F(2,22)= 12.06, p 
s.0004, MSes.2355, and there were no other 
significant effects in the analysis of naming 
latencies. There were no significant effects at all 
in the error analysis of the naming data. 

Discussion 

Our analysis of the words designed to test the 
cohesiveness of the onset-rime boundary and the 
possible effect of the sonority hierarchy on that 
boundary produced some surprising results. There 
were no effects of syllable-internal structure or 
sonority in the naming data. The lexical decision 
data also failed to demonstrate any such effects, 
but showed an apparent effect of word frequency, 
in both the latency and error analyses. 

The analysis of the nucleus-coda test items 
proved somewhat more promising with respect to 
syllable structure and sonority (see also Treiman, 
1984, 1986). In the overall error analysis, there 
were significantly more errors when the asterisk 
intervened at Position 5 (between the two 
consonants of the coda) than at Position 4 
(immediately after the vowel) or at Position 6 (at 
the end of tiie word). These results suggest that 
interruption at the nucleus-coda boimdary (after 
the vowel) is less disruptive than within the coda 
itself. In both the separate lexical decision and 
naming latency analyses there were significant 
main effects of coda composition, with responses 
to obstruent items faster than to those with a 
nasal, which in turn were faster than those with a 
Uquid. Indeed, this was the only significant effect 
found in the separate analysis of the naming data. 
On the other hand, in the error analysis of the 
lexical decision data, there was a significant effect 
of asterisk position, showing that the disruptive 
effect of the asterisk appearing within the coda is 
a postlexical effect. Finally, there was also a 
marginally significant interaction for the lexical 
decision error analysis of the coda composition 
with asterisk position. This interaction provided 
partial support for the notion that the class of the 
postvocalic consonant affects the cohesiveness of 
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the nucleus and the coda. Postvocalic obstruents 
are lowest on the sonority hierardiy. Thus, they 
are expected to show the Ua$t cohesiveness with 
the pT6cedins vowel and the most cohesiveness 
with the final consonant, followed by nasals and 
then liquids. As Figure 2 illustrates, errors were 
greatest for test items with an obstruent when the 
asterisk interrupted the rime at Position 5. Nasal 
test items showed a similar disruption in that 
position. On the other hand, liquid test items 
should have shown more errors wilh an asterisk 
in Position 4, rather than an increase at Position 
v5, becuttse of the greater cohesiveness of liquids te 
the preceding voweL However, that was not the 
case. 

Because of the constraints we followed in 
constructing the stimuli for this experiment, it 
was not possible te have all test items begin with 
the same sound, which would have been ideal 
since we used a voice key te record subjecte' 
responses. We wondered whether the various 
phonetic identities of the first consonante of our 
test items had had an effect on the naming speeds . 
We also wondered whether our use of the voice 
key te record subjecte' responses in the lexical 
decision task had introduced greater variability in 
the response times than would have been the case 
with a reaction time key (see, e.g., Pechmann, 
Reetz, & Zerbst, 1989). If so, it might explain why 
our evidence for syllable-internal structure and for 
some influence of the sonority hierarchy on the 
nucleus-coda boundary only emerged in the error 
analysis. We decided te repeat the experiment 
using a t"P"^^**1 reaction time key and substituting 
a silent reading task for our naming task. 

EXPERIMENT 2 
In this experiment, we compared the responses 
of one group of subjecto in a lexical decision tesk 
te those of another group of subjecte whose tesk 
was to read C\e word and nonword stimuli silently 
and to press a key as soon as they were done with 
each item. McNamara and Healy (1988) have 
demonstrated semantic and rhjime fadlitetion 
with a self-paced reading task of this type, which, 
however, like naming, is assumed te involve less 
postlexicsl processing than lexical decision. 

Method 

StimulL The same stimuli used in Experiment 1 
were used in Experimentii. 

Procedure. The procedure was essentially the 
same as in Experiment 1, except that a reaction 
time key was used instead of a voice key. Subjecte 
in the reading condition were to read the word or 



nonword silently and te press a button with the 
index finger of their right hand as soon as they 
had finished reading each item. Subjecte in the 
lexical decision condition were to decide whether 
or not eadi letter string was an English word. 
Thty were teld te rest the index finger of their 
right hand on the ''yes* button and the index 
finger of their left hand on the ''no* button, and to 
press ^es* as quickly as possible if the string was 
a word, and *no* as quickly as possible if the 
string was not a word. They were told that both 
speed and accwacy would be scored by the 
computer. 

Subjects. Thirty-six male and female 
undergraduate studente from the University of 
Colorado at Boulder participated in this 
experiment. They received course credit for their 
participation. They were assigned to conditions by 
order of arrival, according to a fixed rotation. 

Results 

As in Experiment 1, the onset-rime and nucleus- 
coda test words were analyzed separately. Two 
sete of analyses were performed, one on error 
rates (for the lexical decision date only) and 
another on the latencies for correct responses. All 
response latencies were reciprocally transformed 
te speeds for the analyses, but the resulting mean 
speeds were converted back te latencies for 
reporting in the text and in the figures, as for 
Experiment 1. Also as for Experiment 1, items 
were not treated as a random effect because the 
stimuli were not randomly selected (Wike & 
Chuit^h, 1976). 

Onset-rime words. As in Experiment 1 and as 
anticipated given that the lexical decision task 
presumably requires additional postlexical 
processes not included in the reading tesk, the 
mean latency for reading (758 ms) was 
considerably shorter than that for lexical decision 
(920 ms). In the overall latent analysis of the 
onset-rime test items there was a significant main 
effect of lexical decision vs. reading, J?X1,34)»4.56, 
p X.0378, MSe:s5.8280. Also in accord with 
predictions, based on the assumption that the 
asterisk should be least disruptive when it 
precedes the word, the response latency for 
stimuli with immediately preceding asterisks 
(Position 1) was shorter than for stimuli with 
asterisks in Positions 2 and 3 (1.234 vs. 1.188 and 
1.189, respectively). There was a significant main 
effect for asterisk position, F(2,68)«4.44, p «.0152, 
]i€SeB .0999. 

Also as in Experiment 1, despite the small 
differences in frequency among the words 
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comprising the different onset composition groups, 
the average latency for each onset composition 
varied largely as a fiincticm of the frequent of the 
words in the four groups, with more f^requent 
words producing shorter latencies. Thus, latency 
of response (805 ms) was shortest to nasal test 
items (mean frequency 9.8), followed by the 
latenqr of response (817 ms) to single*phoneme 
test items (mean frequent 7.8), followed by a 
minor reversal, with response latency (855 ms) to 
liquid test items (mean frequency 7.3) slightly 
slower than average latent (850 ms) to obstruent 
test items (mean frequent 6.8). There was a main 
effect of onset composition, F(3,102)=8.43, p 
=.0001, MSe=.1338, as well as a significant 
interaction of onset composition with lexical 
decision vs. reading, F(3,102)=5.21, p =.0026, 
MSe=.0826. 

Separate planned analyses of the reading and 
lexical decision onset-rime word data were 
conducted to explore the source of the interaction. 
In Experiment 1 the correlation of word frequency 
and onset composition class was evident for the 
lexical decision task but not for the naming task. 
Similarly, the correlation of word frequency and 
onset composition dass in the present experiment 
occurred in the lexical decision task but not in 
the reading task. There was an effect of onset 
composition on reading, but this effect was clearly 
due to the difference between those words 
in which C2 was a liquid (777 ms) and all the 
others (obstruent = 751 ms, one phoneme = 751 
ms, and nasal = 752 ms). In the separate reading 
analysis, there was a significant effect of onset 
composition, F(3,51)=3.40, p =.0241, MSe=.0254. 
There were no other significant effects in the 
reading analysis. 

The latency data from the lexical decision task 
alone mirror the combined data from both tasks. 
As in the overall data, latencies in the lexical ded* 
sion task to words where the asterisk appeared at 
the beginning were faster (884 ms) than those to 
words where the asterisk appeared after the ini* 
tial consonant (940 ms) or just before the vowel 
(937 ms), as expected because the asterisk should 
be more disruptive when it occurs in the middle of 
a word than when it precedes the word. The effect 
of asterisk position was marginally significant in 
the lexical decision latency analysis, F(2,34)=3.16, 
p =.0538, MSe=.1022. 

As can be seen in Figure 3, both the pattern of 
errors (which were analyzed for the lexical 
decision task only, because no errors were possible 
in the reading task) and the pattern of response 
latendes for the lexical decision task varied as a 



function of the frequency of the four groups of 
words, with error proportions (with the exception 
of one small reversal) lower for more frequent 
items, and with latencies shorter for the more 
frequent items, as in Experiment 1. The mean 
latencies, given in terms of nasal, single-phoneme, 
liquid, and obstruent test items (that is, in order 
from most to least frequent), were 866 ms, 895 ms, 
949 ms, and 977 ms, whereas the error 
proportions (in the same order) were .046, .120, 
.111, and .185. There was a significant effect of 
onset composition for both the latencies, 
F(3,51)«7.87,p «.0004, MSe=.1910, and the error 
proportions, F(3,51)=5.31, p =.0032, MSe=.l740. 

Nucleus-coda words. As found in Experiment 1 
and for the onset-rime words in the present 
experiment and as expected under the assumption 
that the lexical decision task requires more 
posdexical processing than does the reading task, 
the mean overall latency for reading nucleus coda 
words (765) was considerably shorter than that for 
lexical decisions on those words (939). For the 
combined analysis of the nucleus*coda test word 
latencies, there was a significant main effect of 
lexical decision vs. reading, F(l,34)=5.14, p=.0281, 
MSe=4.751. 

Just as we predicted and found that asterisks 
were less disruptive when they preceded a word 
than when they occurred in the middle of a word 
for the onset-rime stimuli, the asterisks should be 
less disruptive when they follow a word than 
when they occur in the middle of a word for the 
nucleus-coda stimuli. Indeed, the latency for 
words with item-final. Position 6 asterisks (825) 
were shorter than those for Position 4 asterisks 
(842), which were in turn shorter than those for 
Position 5 (862). There was a significant main 
effect of asterisk position in the combined analysis 
of the nucleus*coda test word latencies, 
F(2,68)=5.30,p =.0074, MSe=.0730. 

Most crucial is the predicted interaction of coda 
composition and asterisk position. The predicted 
pattern was fo^ind for the lexical decision laten- 
cies, but not cor the errors in the lexical decision 
task nor for the latencies in the reading task. As 
anticipated, the obstruents and nasals showed 
longer lexical decision latencies at Position 5, 
whereas the liquids showed the long est lexical de* 
cision latencies at Position 4. (See figure 4). In a 
separate planned analysis of the lexical decision 
latencies, there was, in addition to a significant 
main effect of asterisk position, F(2,34)=6.7, p 
S.0075, MSe=.0990, a marginally significant 
interaction of coda composition and asterisk 
position, F(4,68)=2.44, p =.0543, MSe=.0335. 
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EXPERIMENT 2» LEX:CAL DECISION 
ONSET-RIME TEST WORDS 
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EXPERIMENT 2, LEXICAL DECISION 
ONSET'RIME TEST WORDS 
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EXPERIMENT 2, READ 
ONSET-RIME TEST WORDS 




OeSTRUEhTT 

LlQUtO 

1 PHONEME 

NASAL 



POSITION 



Figun 3. The multt of the omet-rimt Ittt wonlt in Experiment 2 a* a function of the phonetic cUm of C2 md of 
aileikk poeition. Hie aMeriik appem before Uie woid «t Position 1, after the ftnt letter at PooiBon Z and l>etween ttie 
finrt and eecond letter at Poeition 3. Panek U) and (b) are for the lexical dedilon taak; panel (c) is f or the rwMling tadu 
llie latency aaaiyaia ia shown in parnk (a) and (c); ttie enoranalyaie ia ehom 
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EXPERIMENT 2, LEXICAL DECISION 
NUCLEUS*COOA TEST WORDS 
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EXPERIMENT 2, LEXICAL DECISION 
NUCLEUS*CODA TEST WORDS 
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EXPERIMENT 2. READ 
NUCLEUS-CODA TEST WORDS 



0.74 
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NASAL 
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Figure 4. The multo of Uit nudcus-coda Ittt wordc in Experiment 2 as a function of Ihc ]^onetic class of C3 and of 
asterisk position. The asterisk appears after the vowel at Position 4, between tiie last two letter at Position 5^ and after 
the word at Position 6. Panels (a) and (b) are for the lexical decision task; panel (c) is for the reading task. The latency 
analysis is shown in panels (a) and (c); the error analysis is shown in panel (b). 
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It should be noted that although this cracial 
intaraction was only margmally significant by this 
test, the sUtistic used was very conservative 
because it was not directional. If a directional test 
were employed (which seems appropriate in this 
case because a specific pattern of results was 
anticipated and obtained), then the results would 
be deaily significant. In any event, there were no 
significant effects in the separate analysis of mean 
proportion errors for lexi'cal decision nor in the 
separate analysis of the latencies for the reading 
data. 

Discussion 

As in Experiment 1, we found highly consistent 
significant differences between our two tasks in 
both the latency and error analyses. These 
significant effects are, of course, consistent with 
the notion that the lexical decision task requires 
additional processing. 

When we consider the onset-rime data, the most 
interesting effect that emerged is the effect of 
onset composition, such that speeds and error 
rates varied largely as a function of the firequenpy 
of the stimuli in each of the onset-composition 
groups. As in Experiment 1, both the separate 
latent and error analyses of the lexical decision 
data showed that responses to iht different onset- 
composition groups varied as a function of their 
frequenqf. On the other hand, the main effect of 
onset composition in the separate latenqr analjrsis 
of the reading data was due to slower response 
times to stimuli wifli liquids as the second 
consonant There was also an effect of asterisk 
position in the lexical decision latency analysis, 
but it provided no support for the internal 
structure of the syllable, because there was no 
difference in the latencies to words with asterisks 
f^ipearing within the onset as compared to those 
with asterisks between the onset and the vowel. 
But, the response latencies in both of those 
positions was marginally significantly slower than 
when the asterisk appeared at the very beginning 
of the word. 

As in Experiment 1, it was only the analysis of 
the nudeus-coda data that provided some support 
for the notion of syllable-internal structure and for 
the influence of the sonority hierardiy on that 
structure. Thus, asterisks placed between the nu- 
cleus and the coda were less disruptive than those 
placed within the coda, for the lexical dedsion 
analysis. More importantly, in the separate 
latency analysis of the lexical dedsion data, there 
was an interaction (which was marginally 



significant by a conservative non-directional test) 
between asterisk position and coda composition, 
so that test items with postvocalic liquid 
consonants produced the slowest latency of 
response when the asterisk appeared immediately 
after the vowel in Position 4, whereas test items 
with postvocalic nasals and stops produced the 
slowest speeds of response when the asterisk 
appeared just before the final consonant in 
Position 5. This pattern is consistent with an 
effect of the sonority hierardiy on the nucleus- 
coda boundary, because liquids are higher on the 
sonority hierardiy and therefore more cohesive 
with the preceding vowel (hence the slower 
latency for asterisks in Position 4), whereas 
obstruents and nasals are lower on the sonority 
hierardiy and therefore more cohesive with tue 
following consonant (hence the slower speeds for 
asterisks in Position 5). 

GENERAL DISCUSSION 

We found evidence, but only in our lexical 
dedsion tasks, in support of the division of the 
rime into a nucleus and a coda as well as evidence 
that suggests that the sonority of the postvocalic 
consonant affects the strength of that break. It 
appears from our data that these syllable- 
structure effects are postlexical (occurring in the 
lexical decision rather than in the naming or 
reading tasks). 

On the other hand, despite the wealth of 
psycholinguistic evidence supporting the syllable- 
internal structures of onset and rime, we were 
unable to find evidence to support this division in 
our two experiments. Instead, we found evidence 
of a word frequency effect, even though we 
controlled for word frequency, such that the 
differences among the word frequendes in the four 
onset groups were not significant. This 
unanticipated word-frequency finding has 
potential methodological import Given multiple 
experimental constraints, researchers have 
probably been unable in many cases to find exact 
frequent matches for their stimuli. They have 
probably generally assumed that small frequency 
differences of the type that separated our groups 
of onset-rime words would be unlikely to produce 
any effect. Furthermore, the finding also has 
theoretical import, since these small frequency 
differences turn out, at least in this case, to 
matter significantly. Indeed, our word frequency 
effect was strong enough, occurring in both 
experiments and for both accuracy and latendes, 
to override any effect of the onset*rime break. 
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We would suggest that previous studies that 
supported the notion of a break between the onset 
and rime, even with nonword stimuli, were able to 
find such evidence because the tasks that they 
employed relied largely on a form of phonological 
coding used to maintain information in short-term 
memory, a form of phonological coding which may 
not be required by simple naming and reading 
tasks. 

Besner and Davelaar (1982) present evidence 
that the phonological code used to adiieve lexical 
access from print is not the same phonological 
code used to maintain information in short-term 
memory. In particular, they found that subjects 
better recalled nonwords with an entry in the 
phonological lexicon (e.g., BRANE) than nonwords 
without such an entry (e.g«, SUNT) even under 
conditions of articulatoiy suppression, whereas ef- 
fects of phonological similarity and word length 
were eliminated by articulatory suppression. 
Because of the opposing effects of articulatory 
suppression, they argue that there are two phono- 
logical codes. The first phonological code permits 
lexical access, whereas the second code, more 
strongly affected by articulatory suppression, is 
used to maintain information in short-term mem- 
oiy. If we assume that the first phonological code 
not only permits lexical access but also subserves 
naming and that effects of syllable structure and 
sonority emerge through use of the second, short- 
term-memory phonological code, then we can rec- 
oncile our results with those of previous studies. 

The majority of the psycholinguistic studies 
finding evidence in support of the hierarchical 
structure of the syllable involve tasks that require 
the maintenance of information in short-term 
memory* The novel word games task used 
frequently by Treiman (e.g., 1983, 1984, 1986) and 
the substitution-by-analogy task (where subjects 
switch specified parts of two jointly presented 
monosyllabic strings) used by Derwing et al. 
(1987), Dow (1987), Fowler (1987) and others 
involve such a demand. Thus, it is reasonable to 
assume that they required use of the phonological 
code that maintains information in short-term 
memory and from which effects of syllable 
structure and sonority emerge. Indeed, Treiman 
and Danis (1988) demonstrated syllable structure 
effects using a short-term memory task. 

Perhaps lexical decision, unlike naming and 
reading, makes a greater demand on short-term 
memory. For example, subjects in a lexical 
decision task may store accessed items in short- 
term memory for decision processing. Our 
consistently significant differences between lexical 



decision, on the one hand, and naming and 
reading on the other, support, as do many other 
studies, the notion of additional post-lexical 
processing in lexical decision tasks. We suggest 
that this processing may entail maintenance of 
the accessed item in short-term memory. If 
evidence for the syllable's internal organization 
and for the influence of the sonority hierarchy on 
th&t organization emerges only in tasks that 
require the maintenance of information in short- 
term memoiy, and if lexical decision requires such 
maintenance, then it is not surprising that our 
results supporting syllable-internal structure 
emerged only in the lexical decision task. 

However, we found support only for the break- 
down of the rime into a peak and a coda, whereas 
Treiman and Chafetz (1987) found, also using a 
lexical decision task, that subjects responded more 
rapidly to visually presented words and nonwords 
when slashes appeared between the onset and the 
rime than when they appeared between the peak 
and the coda. There are at least two possible 
sources for this discrepancy. In the first place, 
they compared visual interruptions after the onset 
and after the peak within the same set of words 
and nonwords, whereas we used different words to 
test the strength of the onset-rime boundary and 
the nucleus-coda boundary. We thus could not 
compare directly the strength of these two bound- 
aries. Secondly, we found an imantidpated, signif- 
icant effect of onset type, apparently related to the 
frequency of the stimulus items, that may have ef- 
fectively masked differences between interrup- 
tions that occurred within the onset and those 
that occurred between the onset and the rime and 
that may have also conceivably masked an inter- 
action of the sonority hierarchy with syllable 
structure. In any event, given the pattern of our 
other results, we would predict an onset-rime 
boundary effect to emerge only postlexically, in a 
lexical decision task cr other task requiring 
maintenance of information in short-term 
memory. 

Fowler (1987) and Browman and Goldstein 
(1988) have argued that the syllable's internal 
structure may arise as a result of articulatory 
constraints on the timing of initial versus final 
consonants vdth respect to vowels in the same 
syllable. Because the phonological code required to 
maintain information in short-term memory is 
more ctrongly affected by articulatory suppression 
than the phonological code permitting lexical 
access (according to Besner and Davelaar, 1982), 
it would seem reasonable to suggest that it too has 
an articulatory basis (see, e.g., Hintzman, 1967). 
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In any event, the results of our experiments taken 
in ooAjunction with prior poycholinguistic research 
on the intemml structure of the syllable and the 
sonority hierarchy would suggest the following: 
Support for the hierarchical structure of the 
syllable and for the influence of the sonority 
hierarchy on such structure is most likely to 
emerge in tasks that implicate phonological coding 
in 8hort*term memory. 
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FOOTNOTES 

*AppeM in joumel ofPgychcimgidekic ^£tmrch. Vol. 20(4), 337- 
3430991). 

t Ako Fmch Departmertt, WeUesley CoUege. 
ttuiAversity of Colorado, Boukler. 
tttuniv«ity of Cokvado. Now at Widener University, Chester, 
PA. 

ISee Selkirk (1982) for Oieoretical arguments that ttie syllable is 
hicrardacaliy organized, but see Davis (1987) for arguments 
tfiat syiU>te divided, nonhierarchicaUy, into onsel, peak 
and coda. 

^IiHir of the six nasal item ar^ one of the six obstruent items 
had a C1C2VC3 sbructure, al thou^ all were five lette« k>ng. 

^Tlvee of the ilcma in tttb group began with the letters "ch," 
chMcterized by some phonotogists as a single phonemt / 1/ 
and by others as a sequence of two phonemes / i// . 

^We inadvertently included two items that appeared both as 
ofwet-rime and k nudeus^oda test items, stem and l>ra£%d. 
TheMSodated nonwords were different in each case. 

^e do not leport the results of our analysis of the nonword 
data becMM the significant effects provided no support for 
f]i]iMe intamsl structure or sonority and were inconsistent 
across the two experiments. A comparison of tiie speed 
analysis and the error analysis also indicated a number of 
probable speed-accuracy tradeoffs, although these were not 
evident in die word data. 

^This trmsfocmetion produced more normally distributed 
vahies Mvl eiatfnatad disproportionsle tnfkienota by outliers. 

"^Although there we?e differeiKes in frequency in the onset- 
rime groups, these differences were not significant, 
F0,32)-.142, p «.9208, MSe«69 J620. Nonethriess, %ve bciicve 
that the oMet effect is best explained in terms of word 
frequency. We examined single4ettcr and di- end trigram 
ffaquenciea (Mayzner h TrssNlt, 1965; Mayxner, Tresaeit, 4e 



ERLC 



BEST COPY ftVAIlflESlE 



06 



SyUoble4nttrmd Structure and tiie Sonoribf Hkrardry: Differential Evidence firm Lexical Ikcision, Naming, and JUadhtg 87 



Wolin,196S) »d roufid no conation with the p«Hm <rf our 
Mults for orad-riine (or coda) M words. Furtttcnnore, both 
the word and the nonword stiinuli had the s^me initial 
cofwonant dtartcts, but ttie onaet c£fect only occtured in the 
word data. RnaQy, aa niggvttod by an anonymous iwiewvr. 



we compared tt>e mean latencies of the subjects in our two 
ejqpcnments to »+n onaet^iime words (which are relatively 
infrequent) and s-i-in oneet-tiine words (which are rdativdy 
frequent) and found a significant frequency effect there as 
wdl f P9)>3.m P "-0031 two taiM. 
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APPENDIX 



Test Items Used in Experiments 1 and 2 



Word 



Nonword 



Onset-Bime 

Word NonwoTd 



Word 



Nonword 



cic2=i phoneme 



C2=liquid 



C2=nasal 



chest 


chom 


thorn 


thest 


chill 


chi^ 


thii^ 


thill 


shark 


sheft 


theft 


thaik 


shmit 


thump 


chump 


chunt 


shawl 


chawl 


champ 


shamp 


thumb 


shumb 


shirt 


thirt 



craft 


fiaft 


flint 


crint 


drank 


glank 


glint 


drint 


clasp 


blasp 


bleed 


dend 


prank 


trank 


tramp 


pramp 


plump 


brump 


brand 


pland 


clink 


grink 


grind 


dind 



smart 


snart 


smash 


snash 


sniff 


smiff 


snarl 


smarl 


smell 


snell 


snuff 


emuff 


c2=:ob8truen,. 


stem 


Fpem 


spasm 


stasm 


skunk 


scunk 


scowl 


skowl 


stark 


scark 


skimp 


stimp 



Nucleus-Coda 



c3=:ob8truent 



C3=liquid 



blast 


crasp 


dwarf 


crisp 


blisp 


smirk 


brisk 


crisk 


scald 


crust 


brust 


snort 


cleft 


greft 


scalp 


grist 


clist 


stem 


draft 


twaft 


spark 


twist 


drist 


skirt 


tract 


traft 


sport 


graft 


gract 


storm 


grasp 


frasp 


spurt 


irost 


grost 


swirl 



smarf 

dwirk 

scort 

snald 

seem 

stalp 

skark 

spirt 

sporm 

stort 

spirl 

swurt 



C3:=na8al 



blunt 


swunt 


swamp 


blamp 


blank 


slank 


slump 


blump 


print 


clint 


clump 


pr^unp 


stint 


blint 


blond 


stond 


blink 


blant 


scant 


scink 


trunk 


trand 


brand 


brunk 
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Effects of Phonological and Phonetic Factors on Cross- 
Language Perception of Approximants* 

Catherine T. Bestt and Winifred Strangett 



Past research suggesto that de^ee of difficulty adults have with discriminating 
nonnative segmental contra? U vanas considerably across contrasts and languages. 
According to a recent proposf this variation may be explained by differences m how the 
nonnative phones are perceptually assimilated into native phoneme categories (Best, 
McRoberts & Sithole, 1988). The present study examined that proposal by testing 
identification and discrimination of three synthetic series of American English 
approximant contrasts, presenUd \^o American Engiish^pealdng siAjects and native 
Japanese-speaking learners of EnglUh. The English approximanU differ with respect to 
their phonemic status in Japanese, as well as in the phonetic details of the most "nnlar 
Japanese phonemes. The perceptual assimilation hypotheaes were strongly upheld m 
cross-language comparisons. Moreover, on the assumption that perceptual assmilation 
may be modified by learning the second language (L2), we also evaluated differences 
between subgroups of the Japanese subjecU who had two different levels of Enghsh 
conversation experience. Those with inUnsive English conversation experience »nowed 
identification and discrimination patterns that were more similar (but not identical) to the 
Americans' perfonnance than did those who had had little English experience. 



1. INTOODUCnON 

Language-specific experience influences the 
perception of phoneme contrasts. Adults are often 
hampered in their identification and/or discrimi- 
nation of phones that are not employed con- 
trastively in the phonological system of their 
language. For example, monolingual Japanese 
and Korean speakers have difficulty distin- 
guishing the American English liquids M and /I/, 
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which do not occur contrastively in their native 
languages (Gillette, 1980; Goto, 1971; Miyawaki, 
Strange, Verbrugge, Liberman, Jenkins, & 
Fujimura, 1975; Sheldon & Strange, 1982). 
Analogously, English speakers have difficulty 
with some nonnative contrasts such as the Czech 
retroflex vs. palatal fricatives (Trehub, 1976), Thai 
voiced vs. voiceless \inaspirated stops (Lisker & 
Abramson, 1970), Hindi dental vs. retroflex stops, 
and Salish velar vs. uvular ejectives (Polka, 1991; 
Tees & Werker, 1984; Werker & Tees, 1984). This 
perceptual difficulty, however, appears to be 
neither universal nor immutable. Some nonnative 
contrasts are relatively easy to discriminate even 
without prior exposure or training (e.g.. Best, 
1992; Best, McRoberts, & Sithole, 1988). 
Perceptual difficulties witih particular contrasts 
also vary depending on syllable position and 
phonetic context (e.g., Mochizuki, 1981). Other 
contrasts are distinguishable when listening 
conditions minimize memory demands or 
phonemic categorization (Carney, Widin, & 
Viemeister, 1977; Werker & Logan, 1985). 
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Discrimmation of nonnative contrasts that are 
initially difficult for adults can sometimes be 
improved rapidly through laboratory training 
(e.g., Pisoni, Aslin, Perey, & Hennessy, 1982), 
while others are resistant to change (Strange & 
Dittmann, 1984). Perception of non-native 
contrasts improves in the course of learning to 
speak a second language (L2), even in adulthood 
(e.g., MacKain, Best» & Strange, 1981), althou^^h 
improvement is often more marked if exposure to 
L2 occurs before puberty (Tees & Worker, 1984; 
Yamada & Tokhura, 1991; see Flege, 1988). 
Furthermore, some individuals appear to be more 
sensitive than others to nonnative distinctions 
even without experience or training (e.g., subject 
M. K. in MacKain et al., 1981; see also Polka, 
1991; Pruitt, Strange, Polka, & Aguilar, 1990). 

The fact that native language experience con- 
strains perception of nonx;ative contrasts, but that 
further experience with nonnative sounds may 
nonetheless alter those perceptual constraints 
even in adults, raises questions about the nature 
of the native-language influence. Specifically, 
what properties do listeners perceive in nonnative 
sounds, and how might those properties relate to 
the perceived properties of native phonemes? 

Recently, it has been proposed that mature 
listeners perceptually assimilate most nonnative 
phones to native categories (Best, 1992; Best et 
al., 1988; cf. Flege, 1990). That is, the nonnative 
phones are perceived in terms of their similarities 
(and dissimilarities) to native phonemes. 
According to this model, mature language users 
assimilate nonnative speech sounds to native 
cat^ories on the basis of their perceived gestural 
(articulatory-phonetic) similarities to native 
phones (Best, 1992). The gestural similarities and 
dissimilarities referred to are based on the model 
of gestural phonology proposed by Browman and 
(Soldstein (e.g., 1986, 1989; Goldstein & Browman, 
1986), i.e., they refer to temporal and spatial 
properties (i.e., degree and location of 
constrictions) of the dynamic movements of vocal 
tract articulators such as lips, jaw, tongue body, 
glottis, etc. 

Four perceptual assimilation patterns are possi- 
ble: 1) The two members of the nonnative contrast 
may be assimilated into two categories in the 
native phonology; 2) Both nonnative phones may 
be assimilated equally well (or poorly) into a ain* 
gle category; 3) Both may be assimilated into a 
single category, but unequally, thus showing a 
category goodness difTerence in their fit to the 
native phoneme; or 4) The nonnative phones may 
differ so much from the phonetic properties of na- 



tive phonemes that they are non-assimilable. 
Note that the assimilation pattern depends on the 
listenei^s perception of similarities; listeners may 
differ from one another, even within the same na- 
tive language, with respect to which phonetic 
properties of a nonnative phone they may detect 
or attend to in perception. (Although it might be 
argued that nonnative phones are assimilated on 
the basis of acoustic-phonetic similarities rather 
than, or in addition to, gestural similarities, the 
distinction is difficult to make because articula- 
tory- and acoustic-phouetic properties are con- 
founded in the signal.) 

Best and colleagues (1988, 1992) predicted that 
phones that are assimilated equally to a single 
categoTy should prove most difficult to discrim- 
inate. Discrimination of phones assimilated tc two 
different native categories should be quite good, 
while contrasts that are non-assimilable, or those 
that show a category goodness difference in as- 
similation, should result in intermediate and 
variable levels of discrimination difficulty. The 
level of discrimination for nonnative phones that 
differ in category goodness should depend on the 
degree of perceived phonetic similarity between 
the native phoneme categoiy and each of the non- 
native phone categories. Non-assimilable con- 
trasts are perceived as nonspeech sounds rather 
than as phonological segments; for them, discrim- 
ination difficulty should be a fimction of acoustic 
similarity. 

Thus, the issue of native-language (LI) 
influence on perception of nonnative speech 
contrasts focuses on the relation between phonetic 
details and phonemic categories. In turn, any 
readjustment in perception as a result of further 
experience with nonnative phones would seem to 
involve an adjustment in the perceived phonetic 
details of the second language (L2) phoneme 
categories (cf., Flege, 1990; Flege & Bohn, 1989). 
That is, nonnative phones may be assimilated to 
native phonemes to the strongest degree by 
listeners who have had little or no L2 experience. 
However, increased L2 experience may foster 
improved recognition of the discrepancies between 
the LI and L2 phones. This could lead to a decline 
in degree of assimilation of L2 phones to LI 
categories, and perhaps ultimately to the 
emeigence of a separate L2 phoneme category due 
to improved recognition of phonetic properties 
within the L2 phonological system. We pursued 
these issues in the present study by examining the 
perception of three English approximant contrasts 
by American English listeners and by Japanese 
listeners at two levels of English experience. 
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Contrasts between approzimant consonants (/w- 
jf, /w-rA and /r-1/) in syllable-initial position offer a 
rich context for studying the pereeptaal influence 
of both phonetic and phonological differences 
between American English and Japanese. The 
contrasts differ across these languages in their 
phonological status; /r-1/ is a phc>aeinic contrast in 
English but not in Japanese. The remaining two 
contrasts can be said to represent abstract 
phonological oppositions in both languages. 
However, /w-j/ and /w-r/ differ in terms of the 
similarities between American aiid Japanese 
phonetic realizations of the phonemic categories. 

Realizations of are quite similar in the two 
languages, differing only slightly in phonetic and 
phonotactic details. Both are glide consonants 
with a palatal place of articulation and spread or 
neutral lip posture. However, Japanese phonotac- 
tic constraints disallow the occurrence of ^ before 
the hi^ front vowels W and /e/, whereas no such 
restrictions occur in English. Also, the starting 
tongue posture has been described as somewhat 
lower and fixrther back for Japanese ^ (Vance, 
1987) than for English ^ preceding /a/ (the context 
used in this study), which should, if true, result in 
slightly higher Fl and lower F2 and F3 onset 
frequencies for Japanese ^. 

Tlie phonetic realization of /w/ differs more 
obviously between languages. In English, /w/ is 
realized with lip-rounding or protrusion ([wD, 
similar to the back rounded English vowel /u/, 
whereas in Japanese, /w/ is produced with spread 
lips ([iql), similar to the back unrounded Japanese 
vowel [m] (Bloch, 1950; Vance, 1987). Because lip 
rounding/protrusion lowers the frequency of all 
formants (especially upper formants), F2 and F3 
onset frequencies should be higher (hence more 
similar to English /j/) in Japanese than English 
(see Kasuya, Takeuchi, Sato, & Kido, 1982; 
Lisker, 1957; O'Connor, Gerstman, Liberman, 
Delattre, & Cooper, 1957). 

The cross-language discrepancy in the phonetic 
realization of /r/ is even greater, involving a 
difference in both manner of articulation and 
tongue posture. Whereas American English M is a 
retroflez or palato-alveolar central approzimant 
([il or [j], respectively), Japanese M is usually an 
alveolar tap [t] rather than an approzimant. 
(Bloch, 1950; Price, 1981; Vance, 1987). In 
addition, while English /I/ is an alveolar lateral 
approzimant, Japanese does not employ a distinct 
.1/ phoneme. Japanese /r/ is, in fact, variably 
pronounced, and is occasionally realized in some 
positions by some speakers as an approzimant 
or [j], as a retroflez stop [4)» as an alveolar trill [rl, 



or even as a lateral alveolar tap [11. Thus, the 
lateral alveolar is a rare allophone of M in 
Japanese and is apparently not even then an 
i^prozin^nt; rhotic apprazimants may occur but 
are also quite rare (Bloch, 1950; Miyawaki, 1973; 
Vance, 1987). 

According to the perceptual assimilation model 
(Best et al., 1988; 1992), Japanese listeners would 
be ezpected to assimJlate the English /w-j/ 
contrast as a two category contrast vis a vis their 
native phonology. However, the phonetic bound- 
ary between categories may be shifted toward /j/ 
(that is, Japanese may hear more /w/s), since the 
Japanese /w/ is unrounded and is more similar to 
English /if acoustically and articulatorily than is 
the American English /w/. Nonetheless, catego- 
risition and discrimination should be quite good. 
English /w-r/ might be ezpected to be assimilated 
to a single Japanese phoneme category, but as a 
contrast involving a category goodness difference. 
That is, since English /r/ is an approzimant, not a 
tap as in Japanese, it seems likely to be assimi- 
lated as a *pooi^ ezemplar of the Japanese approz- 
imant /w/, whereas English /w/ would be assimi- 
lated as a ^tter^ ezemplar of Japanese /wA The 
possibility that U would assimilate to Japanese 
/w/ is supported by evidence from Mochizuki 
(1981) and Yamada and Tokhura (1991). The al- 
ternative possibility, though less likely, is that 
English M mi^t be assimilated as a very poor ez- 
emplar of the Japanese tapped /r /, which would 
lead to two category assimilation for /w-r/. In ei- 
ther case, Japanese discrimination of /w-r/ should 
be good. Finally, English A*-!/ should result in sin- 
gle category assimilation by Japanese, in which 
both phones are equivalently poor ezemplars ei- 
ther of their approzimant /w/ or (less likely) of 
their tapped /r/. Japanese categorization and dis- 
crimination are known to be rather poor for sylla- 
ble-initial M and /I/, particularly for those who 
have had little conversational English ezperience 
(Miyawaki et al., 1975; Mochizuki, 1981). 

Best et al. (1988; 1992) diccussed assimilation of 
nonnative speech contrasts only in terms of their 
relative levels of discriminability. In the present 
study, the concept of perceptual assimilation was 
eztended to predict cross-language differences in 
phonetic category boundaries along synthetic ap- 
prozimant saries that interpolated on multiple, 
phonetically-relevant acoustic parameters. 
Specifically, in identification tests of /w-j/ and /w-r/ 
series, the Japanese listeners were ezpected to la- 
bel more of the acoustically intermediate stimuli 
as /w/ than American listeners. For /w-j/, which 
are distinguished primarily by F2 and F3 onsets 
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and traniitioni, itimuli vrith hiirher F2 and F3 
values are more similar to Japanese [iq] than to 
American [w]. Thue, the Japanese /w-j/ bouzidary 
should be shifted toward f}/, relative to the 
American boundary. However, the steepness of 
the category boundary should be equivalent in the 
two language groups because the contrast reflects 
a phonological opposition for both. 

In the case of /w-r/, Japanese listeners might be 
eaq>ected to label more intermediate stimuli as /w/ 
rather than as My as compared to American lis- 
tenerSy because the slow transitions of these ap- 
prozimants are more similar to the Japanese /w/ 
than to their tapped M (see also Moduzvld« 1981). 
Yet because neither the English /w/ nor /r/ are 
ideal exemplars of Japanese phoneme categories, 
and because /w-r/ was expected to be assimilated 
as a category goodness difference within the 
Japanese /w/ category, their identification function 
was expected to be less steep in the region of the 
category boundary than that of American 
listeners. 

No clear predictions can be made about the 
location of the /r-V boundary for Japanese. 
However, the predicted single category 
assimilation pattern is consistent with previous 
findings that the labeling function is less clearly- 
defined for Japanese than for American listeners, 
resulting in a shallower slope at the category 
boundary (e.g., MacKain et aL, 1981; Miyawaki et 
al., 1975). 

increased L2 experience serves to shift adults' 
perception of the phonetic details of nonnative 
phonemes toward improved recognition of the dir^- 
crepancies between L2 phones and the LI cate- 
gories to which they were initially assimilated (cf. 
Flege, 1989; 1990), additional predictions can be 
made about relative performance on the three con- 
trasts by Japanese subjects with more or less spo- 
ken English experience. According to perceptual 
assimilation predictions (Best et al., 1988; 1992), 
Japanese listeners with little English experience 
should discriminate the /w-j/ contrast best, as a 
two category contrast, with a peak in 
discrimination functions at their category bound- 
ary (i.e., shifted toward the ^ end of the series). 
They should show lower discrimination levels and 
a lower, broader boundary-related jpeak (also 
shifted toward /r/) in discrimination of tiie English 
/w-r/ contrast, which shows a category goodness 
difference with respect to Japanese /w/. Their dis- 
crimination should be poorer still on the English 
/r-1/ contrast, a single category assimilation type. 
Thus, discrimination performance by inexperi- 
enced Japanese listeners should be equivalent to 



that of American listeners on the /w-j/ contrast, 
somewhat lower on the /w-r/ contrast, and periiaps 
even lower on the /r-1/ contrast. In comparing 
identification performance of Americans and the 
two Japanese subgroups, we expected that cate- 
gory boundary steepne^ for /w-j/ would be equiva- 
lent across all three groups, but less steep for the 
inexperienced Japanese than the other two groups 
on the /w-r/ and /r-1/ series. Japanese with more 
extensive English conversational training were 
expected to discriminate and identify all three 
contrasts in a pattern more similar to that of 
American adults than their peers who had had 
mwiiwl Ff«gl"l* experience, ie., the position and 
steepness of their category boundaries should 
have become shifted toward the values found in 
Americans. However, according to earlier work 
showing residual differences from Americans on 
syllable-initial /r-1/ (MacKain et al., 1981), even 
the eq>erienoed Japanese listeners were expected 
to differ somewhat from the Americans on the /w- 
r/ and /r-1/ series in both boundary position and 
steepness, as well as in discrimination levels. 

2. EXPERIMENT 1 
2.1 Method 

The aim of this study was to compare 
identification and discrimination of synthetic /r-1/, 
/w-r/, and /w-j/ series by American and Japanese 
listeners. A previous report had examined 
perception of an /r-1/ series by these two language 
groups (MacKain et al., 1981). The stimuli and 
methods for the /r-1/ tasks, as well as the results 
for a larger group of Japanese subjects on that 
contrast, were presented in the earlier 
publication. For the present paper, we reanalyzed 
a subset of those earlier-reported data for 
comparison with responses of the same listeners 
on the other two approximant series. 

2J.1 Subjects. Nine of the 10 original American 
participants in the MacKain et al. study returned 
within the subsequent two weeks for two 
additional test sessions on the /w-r/ and /w-j/ 
contrasts. All were college undergraduates (4 
males, 5 females) recruited throu^ notices posted 
at Yale University. 

Nine of the 13 Japanese who participated in the 
original study returned within two weeks for tests 
on the other two approximant contrasts. Four 
Japanese (2 males, 2 females) had had intensive 
English conversational instruction with native 
American English speakers (8-10 hours/week) and 
had been in residttice in the USA for 18 to 48 
months at the time of testing (Ss 7-10 in MacKain, 
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et al., 1981). These stifaiiecU are hereafter referred 
to as the Experienced Japanese. Five others (4 
male, 1 female) had had little or no English 
conversational instruction (0^ hours/week) and 
had resided in the USA less than 7 moaths (S81-4 
and S13 in MacXain, et al., 1981). These are 
hereafter referred to as the Inexperienced 
Japanese. Note that S13 was sutgect M. K> an 
anomalous listener who showed remarkably good 
/r-V perception even thou^ he had been in the 
U. S. only briefly and had had littio conversational 
experience with English. He was discussed 
separately in MacKain et al., but was incorporated 
into the Inexperienced grrup for the present study 
becaiise of the small number of subjects in each 
subgroup. 

All subjects were paid. All reported good hearing 
in both ears and could read written English. 

2J.2 Stimulus Materials. The /r*l/ series was a 
/rak/-/Iak/ continuum, and is described in detail in 
MacKain et al. (1981). Two additional series, 
/wak/->3ak/ and /wak/*/rak/, were generated in 
analogous manner on the OVE-IIIc cascade 
formant synthesizer at Haskins Laboratories. 
Synthesis parameters for series endpoints, ^aky', 
/wak/y /rak/ (and /lak/), were derived from fjx 
analysis of real speech tokens produced by an 
adult male speaker of American English. These 
endpoint synthetic stimuli were equated for 



overall duration (330 ms incltiding the silence and 
burst of a natural /k/), amplitude and inton&tion 
contour (rising-falling), and spectral pattern of the 
final 105 ms of the 210 ms vocalic portion of the 
syllable. The initial 105 ms of the four stimuli 
differed in frequent of onset and the subsequent 
pattern of transitions of the first three oral 
formants (Fl, F2, F3, respectively). Table 1 gives 
the onset frequencies of these formants for the 
four endpoint stimuli, and Figure 1 provides a 
schematic diagram of the formant patterns for the 
endpoint stimuli of each continuum. 

Table 1. Nominal stimulus parameters for endpoint 
stimulL 



Fomuwt OMct Frcqveiicks (Hz) 



Stimuli 


Fl 


F2 


F3 


/jftk/ 


275 


2105 


2S09 








Arak/ 


275 


644 


2295 




* 


« 




/rak/ 


349 


1067 


1477 






* 




/Uk/ 


349 


1207 


2594 



^Asterisk indintrt that the parameters are interpolated to 
produce Deries between endpoints. 




Figure 1. Scbtmatic dtagnun of the canter frcqutncits of Fl, F2^ and F3 in tha andpoint stimuli for tha thraa stimulus 
sariaa. 
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The lO-itep /wak/-^ak/ series was generated by 
interpolatmg on the F2 and F3 onset frequencies 
in approximately equal steps of 162 Hs and 57 Hz, 
respectively, from the /wak/ pattern (item 1) to the 
/i%k/ pattern (item 10). The initial steady-state 
portion was 26 ms for F2. F3 was steady-state for 
21 ms, followed by a linear transition of 49 kns to a 
common frequency (2379 Hz). As can be seen in 
Figure 1. this produced a "dip* in F3 for stimuli 
toward the /jak/ end of the series, which is 
characteristic ci/if in natural utterances. 

llie 10-step /wak/-/rak/ series was generated by 
interpolating between /wak/ (item 1) and /rak/ 
(item 10) on Fl, F2, and F3 onset frequent (and 
subsequent transitions) in approximately equal 
steps of 8 Hz, 47 Hz, and 91 Hz, respectively. An 
inflection point 28 ms after onset of F2 and F3, 
and 21 ms after onset for Fl, produced an initial 
quasi-steady-state pattern (see Figure 1). 

For comparison, the endpoints of the /nk/Vlak/ 
series are included in Table 1 and in Figure 1. In 
this series, onsets and transitions of F2 and F3 
were varied, as well as the temporal pattern of the 
Fl transition (See MacKain et al., 1981, for a 
detailed description). 

2,1.3 Procedure. The tests for the /rakZ-lak/ 
series are described in MacKain et al. (1981). The 
tests for the other two series were similar in 
format, except that the oddity discrimination test 
\ised in the previous study was not employed; only 
the AXB discrimination task was used for the 
present report. All subjects completed two 
sessions consisting of two tests each, with a 15- 
minute break between the first and second test of 
the session. In one session sufcgects completed a 2- 
choice forced choice identification test followed by 
an AXB discrimination test of the /w-j/ series. The 
other session included identification and AXB 
discrimination tests of the /w-r/ series. Testing 
was conducted in a sound-attenuated chamber 
with 2-4 subjects at a time (all from a single 
language group during a given test session). 
Subjects listened over headphones (Telephonies 
TDH-39) to itimuli presented via a Crown reel-to- 
reel tape deck at a comfortable loudness level 
(approximately 75 dB SPL). 

Each identification test included 20 repetitions 
of each of the 10 stimuli in the series being tested, 
presented singly and randomized within each 
block of 10 trials. Intertrial intervals (ITIs) were 
2.5 s; interblock intt^rvals (IBIs) were 4 s. For each 
trial, subjects wero asked to write one of two 
letters to indicate the initial consonant of the 



syllables they heard; that is th^y wrote W or 
during the /w-j/ identification tests, and W or Tl* 
during the /w-r/ identification tests. 

The AXB discrimination procedure was chosen 
because of its relatively low memory demands and 
low sensitivity to observer bias, by comparison to 
other standard discrimination procedures sudi as 
oddity, 2IAX and 4IAX (e.g.. Best. Morrongiello, & 
Bobson, 1981; MacKain et al, 190; cf. Pollack & 
Pisoni, 1971). Each AXB discrimination test 
contained 10 repetitions of each of the 2 AXB 
orders for the 7 possible pairings of stimuli that 
differed by 3 steps along the continuum being 
tested (1-4, 2-5, 3-6, 4-7, 6-8, 6-9, and 7-10). Trials 
occurred in blocks of 14 (2 orders x 7 AXB 
pairings), and were randomized within blocks. 
Within-trial interstimulus intervals (ISIs) were 1 
s, ms were 3 s, and IBIs were 6 s. For eadi trial, 
the subject circled the number *r or the number 
*3" to indicate whether the second item of the trial 
(X) matched the first (A) or the third (B) item of 
Oiat trial 

2*2 RenultM. The results of identification tests 
are reported first, followed by the results of dis- 
crimination tests. Differences between the 
American group and the Japanese group as a 
whole were statistically analyzed. Performance by 
Experienced and Inexperienced Japanese sub- 
groups were compared with the American group in 
separate analyses. For all analyses, data on the 
perception of /r-1/ by the 9 Americans and 9 
Japanese, which were a subset of the data re- 
ported previously in MacKain et al. (19S1), were 
included for comparison with results on the /w-r/ 
and /w-j/ series. 

2.2.1 Identification tests. Figure 2 presents the 
pooled identification functions for flie American 
and Japanese groups on the /w-j/,. the /w-r/, and 
the /r-1/ continue. These functions represent the 
raw identification data, averaged over 9 subjects 
in each group. As the figure shows, the American 
listeners labeled /w-j/ and /w-r/ categorically, with 
abrupt crossovers at category boundaries and 
highly consistent labeling of within-category 
stimuli. Performance was commensurate with 
their identification of the /r-1/ series. The J^;>anese 
as a group also labeled /w-j/ and /w-r/ categori- 
cally. This contrasts with their identification per- 
formance on the /r-1/ series, which whowed less 
consistency in labeling within-category stimuli. As 
previously reported, performance by tiie Japanese 
was markedly different from that of the American 
listeners on ^e /r-1/ series. 
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Figure 2. Avtxage idcntif katxon fitnctioni for tht Amtrican and JapaiMM listener groups on the three series. 



In order to make between-group comparisons on 
the location and steepness of category boundaries 
for the three series, best fit ogives of individual 
subjects* identification functions were determined 
through narrow-range PROBIT analyses, using 
the labeling probabilities on the three stimuli 
closest to the 50% crossover. This statistical 
procedure fits a cumulative normal curve to the 
raw data, thus smoothing the fiinction. Category 
boundaries were defined as the 50% intercept of 
the ogives. The slopes of these ogives (1/s.d.) 
indicate the peak rate of change in category 
labeling at the crossover, and were used as a 
reflection of the steepness of the category 
boundaries, i.e., larger slope values indicate 
steeper fiinctions. 

The ogives for the Americans and the two 
Japanese subgroups are displayed in Figure 3. 
values were significant, indicating a significant 
deviation between the raw data and the fitted 
ogives, for only 6 out of the 54 PROBIT analyses 
(2 groups X 9 suto^cts x 3 series): three Americans 
on /w-r/, one American on /r-I/, and two 



Experienced Japanese on /w-j/. In all cases, the 
significant resulted from extremely sharp 
category boundaries ihst were not well-fitted to 
three data points, and would have fit better for 
two points. There were only two cases of grossly 
noimionotonic raw identification functions for two 
Inexperienced Japanese on /r-1/. In neither case 
was the PROBFT significant, i.e., the ogives 
provided a good fit to the raw data. 

2,2,1,1 Boundary location analyses. The 
boundary locations for American and Japanese 
groups (expressed in terms of stimulus number) 
on each series are given in Table 2. These data 
indicate that, on average, the boundaries for the 
Japanese on all three series fell to the right of the 
American bouadaries. That is, the mean boundary 
values show that the Japanese labeled more 
stimuli as /w/ on the /w-j/ and /w-r/ series, and 
more stimuli as M on the /r-1/ series. Note also 
that the variability of boundary locations appears 
to be greater on the /w-j/ series than on the /w-r/ 
series for both Japanese and American subjects, 
as reflected in the standard deviations (SD's). 
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Figure 3. Nanow-nngc fitted ogive fiuictim for individual tublocte in tfic American and JapancM groups. Hie S13 
lints indicated in tiie JapanoM plote refer to tfie data from subject M. IC, discussed in MadCain et al. (1981) as being 
inexperienced with American English convenation yet similar to Americans in categorization of M and /!/. 



To test the reliability of these boundary differ- 
ences, a Groups (American vs. Japanese) x Series 
(/w-j/, /w-rA /r-V) analysis of variance (ANOVA) of 
the 50% intercept values of best fit ogives for indi* 
vid\ial subjects was conducted. The main effect of 
Groups was significant, F(l,16) = 10.82, p < .005, 
indicating that the Japanese boundaries were in- 
deed shifted significantly rightward in comparison 
to the American boundaries. Neither the Series 
main effect nor the Groups x Series interaction 
approached significance (p's .17 and .64, respec- 
tively), suggesting that the rightward shift of the 
Japanese boundary occurred in all three series, 
and to approximately the same degree in each. 
However, a priori predictions about possible cross- 
language differences on the boundaries for each 
series warranted an analysis of simple effects, 
which indicated that the language difference was 
significant for /w-j/, F(l,48) = 6.44, p < .02, but 
was marginal for /w-r/(p s .10) and nonsignificant 
for /r-1/ (p s .24). That is, the boundaiy shift be- 
tween language groups was reliable only for /w-f)/. 



To assess the statistical reliability of the differ- 
ences between Experienced and Inexperienced 
8ubgroi^>8 in comparison with American listeners, 
an English Experience (American vs. Experienced 
Japanese vs. Inexperienced Japanese) x Series 
ANOVA was computed (Because group sizes were 
small and unequal, these statistical results should 
be interpreted cautiously, although these factors 
decrease rather than increase the likelihood of 
attaining statistical significance.) The main effect 
of English Experience was significant, F(2,15) = 
6.75, p < .01, while the main effect of Series and 
the English Experience x Series interaction were 
nonsignificant. Planned linear contrasts among 
the three groups, based on a priori predictions, 
yielded reliable evidence that the boundary for the 
Experienced Japanese sutdects was intermediate 
between that of the Americans and that of the 
Inexperienced Japanese, F(l,15) = 13.12, p < .003. 
Table 2 summarizes these differences in boundary 
locations for the Experienced and Inexperienced 
Japanese sub^jects. 
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2Ji.L2 Slope analyses. Table 3 presents the data 
on steepness of category boundaries for American 
and Japanese groups (expressed as the mean slope 
of their ogives). The Japanese showed a pattern 
across the three series that was strikingly 
different from the Americans. The slope for /w-j/ 
was steepest and most similar to Americans\ 
while those for /w-r/ and /r-1/ were less steep than 
Americans'. This was as predicted on the 
reasoning that /w-j/ would constitute a two 
category distinction for the Japanese, while /w-r/ 
would show a category goodness difference within 
a Japanese category^ and both M and /I/ would 
show a poor fit to one Japanese category. 

The statistical reliability of these differences 
was assessed in a Groups (American vs. Japanese) 
X Series ANOVA of slope values. The main effect 
of Groups was significant, F(l,16) = 5.47, p < .04, 
indicating that, overall, the American boundaries 
were significantly more abn^>t than the Japanese 
boundaries. Neither the Series main effect nor the 
Groups X Series interaction was significant. 
However, a priori predictions about cross- 
language differences warranted simple effects 
tests, which indicated that the American slopes 
were steeper than the Japanese slopes on /w-r/, 
F(l,16) = 5.77, p < .03, and /r-V, F(l,16) = 11.58, p 
< .04, but not on /w-j/ (p = .80). 

Again, the Japanese data for Experienced and 
Inexperienced subjects were analysed in an 
English Experience x Series ANOVA which in- 



cluded comparisons to the American group. 
Although the main effect of English Experience 
was only marginally significant, F(2,15) = 2.91, p 
< .09, planned linear contrasts were warranted by 
a priori predictions (American > Experienced 
Japanese > Inexperienced Japanese). These tests 
revealed the predicted direction of group differ- 
ences was significant for /r-I/, F(l,15) = 7.36, p < 
.02, and /w-r/, 1^(1,15) = 5.03, p < .05, but not for 
/w-j/ (p s .99), all as expected. No other effects 
were significant. 

To summarize, the Japanese /w-j/ boundary was 
shifted toward /j/ relative to the American 
boundary. Both Experienced and Inexperienced 
Japanese labeled more intermediate stimuli as /w/ 
than the Americans, as predicted from cross- 
language differences in the phonetic details of /w/. 
Also as predicted, the steepness of the category 
boundary ulope on this series did not differ 
between language groups, indicating that the 
division between /w/ and ^ categories was equally 
sharp for all groups of listeners. These findings 
suggest that the American /w-j/ distinction was 
assimilated as a two category contrast by the 
Japanese listeners, with /wAlike and acoustically 
intermediate stimuli assimilating to the 
phonetically different Japanese /w/, and ^-like 
stimuli assimilating to the phonetically similar 
Japanese /j/ phoneme category. This charac- 
terization is somewhat qualified, however, by the 
discrimination results on /w-j/ (see below). 



Table 2» Boundary locations for American English and Japanese listeners, including Japanese subgroups. Numerical 
values represent stimulus numbers along each of the test series. 



/r-I/ 



Americans 


536 


(1.05) 


4.93 


(0.57) 


5.53 


(0.96) 


Japanese: Overall 


6^5 


(1.07) 


5.72 


(0.70) 


6.08 


(1.40) 


Experienced 


632 


(0.98) 


5.59 


(0.69) 


5.60 


(0.74) 


Inezperknoed 


6.73 


(1.22) 


5.82 


(0.77) 


6.47 


(1.76) 



Table 3. Slope values for American and Japanese listeners, including Japanese subgroups. Numerical values 
represent the peak rate of change in category responses per step along each stimulus series. 



M.J/ 



AaMricant 


2.19 


(1.29) 


L99 


(1.05) 


2.65 


(1.67) 


Japanese: Overall 


2.04 


(1^9) 


1.09 


(0.41) 


1.04 


(1.23) 


Experienced 


1.84 


(057) 


1.24 


(0.49) 


1.75 


(1.55) 


Inexpeiienosd 


2.20 


(1.74) 


0.97 


(0.33) 


0.48 


(055) 
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The identification results were diflPrrent for the 
/w-r/ and /r-1/ series than for /w-j/. As previously 
reported, the Japanese listeners showed 
significantly shallower categoxy boundary slopes 
on /rA/1/, but failed to show a significant difference 
in boundary location, relative to Americans. On 
/w-r/, the Japanese again showed a shallower 
boundary slope than Americans, and their 
boundary location differed marginally from 
Americans* (p = .10) in the predicted direction (i.e., 
^ey identified more stimuli as /w/)* The /w-r/ and 
/r-1/ findings are consistent with the reasoning 
that American English /w-r/ should constitute a 
category-goodness difference within the Japanese 
/w/ category, and that English /r-V should 
represent rather poor examples of a single 
phoneme category in Japanese (either their glide 
M or, less likely, their tapped /r/). 

As for the effect of experience with L2, the 
patterns of identification performance differed as 
expected between the two levels of English 
conversation experience of the Japanese subjects. 



On all counts, the date of the Experienced 
Japanese subjects were more similar (but not 
identical) to the American recnilU than were those 
of the Inexperienced Japanese. More intensive 
English conversation experience was associated 
with a more American-like boundary location on 
the English /w-j/ contrast and with steeper 
category boundaries for the English /w-r/ and /r-l/ 
contrasts. 

2.2.2 Discrimination ttMt$. Discrimination test 
results were also examined for evidence of native 
language differences and influences of L2 English 
experience* Percent correct responses for each of 
the AXB comparison pairs on each stimulus series 
were computed for ^e American and Japaneae 
groups. Pooled discrimination functions for the 
Japanese and American groups are displayed in 
Figure 4, and mean performance levels (<*7erall 
percent correct) are presented in Table 4. The 
relationship between American and Japanese 
discrimination functions varied considerably 
across the three series. 



W-Y 



W-R 



R-L 




— o— Americans (9) 
Japanese (9) 

T 1 I r— I r- 



1-4 2-5 3-6 4-7 5^8 6.97-10 1-4 2-5 3-6 4-7 5-0 6-9 7-10 1-4 2-5 3-6 4-7 5-8 6-9 7-10 



STIMULUS PAIR 



Figure 4, Avengt discrimination functions for th« Amtricsn and Japanese groups on th« Oirwr sens 
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Tabk i!^ Mean correct performance levels pooled for American and Japanese listeners on the AXB discrimination 
task, including Japanese subgroups. 



/W.J/ 



M-r/ 





7432 


(11.94) 


74.68 


(17.46) 


77.78 


(1932) 


JapMMM: OTerall 


77.14 


(12.97) 


65.48 


(15.86) 


64.13 


(14.99) 


Ezpcriettctd 


78.04 


(1157) 


66.43 


(18.00) 


6730 


(1535) 


iBczpcrkaotd 


76.43 


(14.12) 


64.71 


(14.14) 


61.43 


(14.17) 



The data were entered into a Gioups x Series x 
Comparison Pairs (1^, 2^, 3-6, 4-7, 5-8, 6-9, 7-10) 
ANOVA. A significant Groups main effect, Kl>16) 
£ 8.55, p < .01, indicated that Japanese were less 
accurate c erall in discrimination than were 
Americans. The significant main effect for 
Comparison Pairs, 1J'(6,96) = 30.87, p < .001, 
indicated that overall there were peaks and 
troughs in discrimination performance across the 
three series. The latter effect was qualified, as 
expected, by a Comparison Pairs x Groups 
interaction, F(6,96) = 3-39, p < -005, indicating 
that, in general, the Japanese showed smaller 
discrimination peaks than the American listeners. 
The significant Series eflfect, F(2,32) = 3.64, p < 
.04, revealed that discrimination performance was 
somewhat higher overall for /w-j/ than for the 
other two series. However, Series interacted with 
Group, F(2,32) = 6.68, p < .004; as expected, cross- 
series mean performance differed between 
language groups. Simple effects tests of this 
interaction revealed that mean performance 
differed among series for the Japanese, F(2,16) s 
12.77, p < .0005, being substantially better for /w- 
j/ (77% correct) than for /w-r/ (65%) or /r-1/ (64%). 
Planned comparisons provided support for the 
order of performance that had been predicted on 
t^e basis of expected phonemic assimilation 
patterns (/w-j/ > /w-r/ /r-l/), F(l,16) = 25.313, p < 
.0001. However, a test of simple eflfects showed 
that the Americans' mean discrimination did not 
differ significantly across series, p s .58. 

Comparison Pairs and Series also interacted 
significantly, F(12,192) = 6.48, p < .001, indicating 
differences in the cross-series patterns of 
discrimination peaks for both groups, whidi were 
further qualified by a significant Groups x 
Comparison Pairs x Series interaction, F( 12,192) s 
3.04, p < 002. To interpret these interactions, 
separate ANOVAs for Groups x Comparison Pairs 



were computed for each stimulus series. As 
predicted, analysis of the /w-j/ series yielded no 
significant difference between groups in overall 
discrimination accural^. A significant main effect 
of Comparison Pairs, fX6,96) = 21.14, p < .001, 
revealed that both groups showed two peaks of 
relatively accurate discrimination. The occurrence 
of a double peak suggests that both Japanese and 
American listeners differentiated three rather 
than two categories along this synthetic 
continuum, althoui^ they could not indicate this 
i^ the two-category forced-choice identification 
test. (This possibility is consider«xi further below 
and in Experiment 2.) The significant Groups x 
Comparison Pairs interaction, K6,96) = 3.46, p < 
.01, was due to the fact that Japanese and 
American listeners performed differently on both 
within-category extremes of the series (Pairs 1-4 
and 7-10). As indicated in Figure 4, Japanese 
subjects discriminated Pair 7-10 (within-category 
for /j/) more accurately, while Americans 
discriminated Pair 1-4 (within-category for /w/) 
more accurately. This asymmetry in discrim- 
ination of the endpoint within-category 
comparison pairs is compatible with the fact that 
the Japanese category boundary was shifted 
significantly more toward /j/ than was the 
American boundary. That is, both stimuli 10 and 7 
fell within the ^ category for Americans (99% and 
87% of identification responses, respectively), but 
for the Japanese stimulus 7 was quite near the /w- 
j/ boundary (59% identification as /j/) while 
stimulus 10 was a clear /j/ (100%), which resulted 
in better discrimination by the latter language 
group. Conversely, at the other end of the series, 
the Japanese and Americans agreed that stimulus 
1 was a clear /w/ (97 and 98%, respectively), but 
whereas the Japanese also identified stimulus 
item 4 as /w/ 98% of the time, the Americans gave 
only 87% /w/ identifications. Thus the Japanese 
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diiciiminated comparison pair 1-4 near chance, 
while the Americani discriminated that pair more 
readily. In fact, Americans showed the same level 
of performance as on pair 7-10, which had 
received quite similar identification scores. No 
other /w-j/ discrimination pairs differed between 
language groups. 

The pattern of discrimination was quite differ- 
ent on the /w-r/ series. A significant Ck^mparison 
Pairs effect, F(6,96) « 9.70, p < .001, reflected a 
single peak in discrimination performance, with 
trott^ on either side. A significant Gtoiqm effect, 
F(l,16) « 8.64, p < .01, indicated that discrimina- 
tion was less accurate overall for Ji4>anese than 
for American listeners. This was due to their 
poorer performance on pairs at the hr/ end of the 
continuum (1-4, 2-5) and on cross-category pairs 
(3-6, 4-7), as indicated by a significant Groups x 
Comparison Pairs interaction, F(6,96) « 3.23, p = 
.01, and simple effects tests of individual pairs. 
Thus, while both groups showed a single discrimi- 
nation peak, the Japanese peak was shifted 
slightly toward the M end of the continuum, and 
was broader and lower than the American peak. 
Both of these effects are consistent with cross-lan- 
guage phonemic and phonetic differences, as 
discussed in the Introduction. The identification 
test had provided marginal evidence that the 
Japanese /w-r/ boundary was shifted toward the 
/r/ end of the continuum, relative to the 
Americans' boundary, a pattern now corroborated 
by the small rightward shift of the peak in the 
Japanese' discrimination functiin. This shift, 
although slight, is compatible with the greater 
cross-language phonetic similarities for /w/ than 
for /r/. As was argued earlier, the lack of rounding 
the Japanese /w/ should lead Japanese listeners 
to identify more /w/a in the /w-r/ (as well as the 
/w-j/) series. Correspondingly, the poor fit of 
English /r/ to either the Japanese /w/ or the 
Japanese M categories should converge on 
perception of fewer /r/s by the Japanese on the /w- 
r/ series. English /w-r/ was expected to be 
assimilated as a category goodness difference 
within Japanese /w/, English M being heard as a 
poor Japanese /w/. The lower, broader peak in 
Japanese discrimination, relative to the American 
/w-r/ peak and to the Japanese /w-j/ peak(8), is 
compatible with this hypothesis. 

Finally, as previously reported for larger groups 
(MacKain et al., 1981), results on /r-1/ indicated 
significant differences between Groups, F(l,16) - 
10.14, p < .006, and between Comparison Pairs, 
F(6,96) = 17.74, p < .001, as well as a significant 
Groups X Comparison Pairs interaction, F(6,96) x 



2.90, p < .02. Japanese subjects discriminated 
cross-category pairs (3-6, 4-7, 5-8) much more 
poorly than Americans. This was expected, and is 
compatible with the hypothesis that Japanese 
listeners assimilate English /r-1/ as poor 
exemplars of a single category in their own 
language. Note also the difference in Japanese 
performance on /w-r/ versus /r-V in Figure 4. Their 
minimal ''peak* in discrimination of the cross- 
category /r-1/ pairs is clearly lowor and broader 
than their peak in discrimination of /w-?/. This 
rel^*^ m is con4>atible with the hypothesis that M 
and /!/ are assimilated to a single native category, 
whereas the /w-r/ contrast constitutes a category 
goodness differeoee for Japanese. 

Differences in discrimination performance by 
Experienced and Inexperienced Japanese sub- 
groups were also considered. Overall accuracy 
across English Experience and Series is shown in 
Table 4 and Figure 5. Both Japanese subgroups 
performed relatively well on the /w-j/ series; mean 
levels were similar to the Americans*. For /w-r/ the 
Japanese subgroups showed similar performance 
levels (but note the difference in the position of 
their perfonnance peaks. Figure 5), although their 
performance was lower than Americans. 
Inexperienced Japanese showed lower /r-V per- 
formance than Experienced Japanese, but again 
both groups performed less well than Americans. 

An English Experience (Americans, Experienced 
Japanese, Inexperienced Japanese) x Series x 
Comparison Pairs ANOVA revealed significant 
effects of English Experience, F(2,15) = 4.70, p < 
.03, Series, F(2,30) = 6.16, p < .01, and 
Comparison Pairs, F(6,90) = 25.40, p < .01, as well 
as significant two-way and three-way interactions 
[Series x EngUsh Experience, F(4,30) = 3.34, p < 
.03; Comparison Pairs x English Experience, F(12, 
90) = 1.93, p < .05; Series x Comparison Pair, 
F(12, 180) = 6.24, p < .001; Series x Comparison 
Pair X English Experience, F(24, 180) = 2.04, p < 
.01]. Analyses of simple effects for Series within 
Japanese subgroups showed no significant 
differences in overall accural across series for the 
Experienced Japanese (p - .10), although peaks 
and troughs were positioned differently across 
series, as indicated by their significant Series x 
Comparison Pairs interaction, F(12,36) = 4.84, p < 
.01. In contrast, a significant Series effect for the 
Inexperienced Japanese indicated more accurate 
discrimination of /w-j/ pairs than of /w-r/ or of /r-V 
pairs, F(2,8) * 9.31, p < .01. A planned linear 
contrast on l^ie predicted performance pattern C/w- 
j/ > /w-r/ > /r-1/) was also significant for the latter 
subgroup, mj2) « 16.85,p < .01. 
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W-R 




R-L 




1-4 2-5 3-6 4-7 5-8 6-8 7-10 1 



^ " Inexperienced (5) 
Experienced (4) 

1 1 1 1 1 1 1 — w — ' ' ' ^ ' ' * 

•4 2-5 3-6 4-7 5-8 6-9 7-10 1-4 2-5 3-6 4-7 5-8 6-9 7-10 
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Figuft 5. Average discrimination functions for tfie Experienced and Inexperienced Japanese subgroups on the tiiree 



senes. 



Experienced and Inexperienced subjects 
performed almost identically on the /w-j/ series; 
both ^oups displayed double peaked functions) 
which suggest that all the Japanese sulgects could 
differentiate acoustically intermediate itimuli 
from both /w/ and /j/ phonetic endpoints. An 
English Experience x Comparison Pairs simple 
effect ANOVA for /w-j/ revealed no significant 
effect of English Experience (p = •66) and a 
marginally significant EngUsh Experience x 
Comparison Pairs interaction (p = .08). The latter 
suggests a tendency for the discrimination peaks 
to be higher^ and for the peak between /j/ and the 
intermediate stimuli to be shifted toward /j/, in 
both Japanese subgroups relative to the 
Americans. 

There were obvious differences in the pattern of 
discrimination for Experienced and Inexperienced 
subgroups on /w-r/ and /r-1/. Separate EngHsh 
Experience x Comparison Pairs analyses revealed 
significant overall group differences in discrimina- 
tion of /w-r/, F(2,15) = 4.16, p < .04) and of /r-V, 
F(2,15) = 5*56, p < *02. Planned linear contrasts 
indicated that the expected ordering of perfor- 
mance (American > Experienced Japanese > 
Inexperienced Japanese) was significantly upheld 
for both series [F(l,2) = 6.85, p < .02 and F(l,2) = 
10*38, p < .01, respectively]. Performance by the 
two Japanese subgroups on /w-r/ suggested an ef- 



fect of experience on the location of the phonetic 
boundary. This was corroborated by a significant 
English Experience x Comparison Pairs interac- 
tion, JFX6,90) = 2.08, p < .04. While discrimination 
for Experienced Japanese was most accurate for 
comparison pair 4-7 (as it was for Americans), the 
Inexperienced Japanese performed best on pair 5- 
8. For /r-1/» English experience instead affected the 
height of the discrimination peak across the cate- 
gory boundary. Consistent with the larger dataset 
reported in MacKain et al. (1981), Experienced 
Japanese showed better discrimination than 
Inexperienced Japanese on cross-category pairs 
(4-7,5-8). 

23 Discussion 
Both the identification and the discrimination 
results are consistent with predictions based on 
the perceptual assimilation model (Best, 1992; 
Best et al., 1988). That is, American EngHsh /w-r/ 
appears to be perceived as a category goodness dif- 
ference within one Japanese phoneme category 
(/w/), and /tA/ are perceived as poor examples of a 
single category. The identification results and the 
mean discrimination performance levels on /w-j/ 
are compatible with the hypothesie that the 
phones are assimilated to two different Japanese 
categories (but see the qualifications discussed 
below). Analyses of the two Japanese subgroups 
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further corroborated predictions. Specifically, 
Experienced Japanese performed more like 
Americans than did the Inexperienced Japanese 
on all series and measures except for discrim- 
ination of /w-j/. On that series, there were no 
cross-langoage differences (as expected) except for 
the within-category comparison pairs at the 
endpoints of the series; this pattern is compatible 
with language differences in the phonetic 
properties of /w/. 

There was a surprise, however, in the discrimi- 
nation results for the /w-j/ series. The double peak 
in discrimination by the Americans and both 
Japanese subgroups suggested that all listeners 
may have perceived three rather than two cate- 
gories along the series, with some category inter- 
mediate between /w/ and ^ perceived in tiie cen- 
tral portion of the series. This suggests the possi- 
bility that the /w-j/ series actually constitutes a 
combination of a two category distinction for 
Ji^anese (/w-j/), along with a category goodness 
difference within one of those categories. 
Comparison between the Japanese identification 
function and their discrimination performance in- 
dicates that most of the intermediate category to- 
kens (5-7) were labeVMi as ambiguous fwfu. These 
items were apparently difficult to discriminate 
from one another but easy to discriminate from 
""good" /w/*s (i.e., items 1-3, consistently labeled )%s 
/w/), suggesting a goodness-of-fit distinction 
within the Japanese /w/ category. Indeed, when 
the e]q)erimenter8 listened to this syn&etic series, 
several items near the center of the series were 
perceived as /l/-like. Consistent with this percep- 
tion, the Fl, F2, and F3 onset frequencies and 
transition patterns in the central stimuli of the /w- 
j/ series were quite similar to those of the stimuli 
in the /r-1/ series that were identified by 
Americans as /I/. The suggestion that the /w-j/ se- 
ries actually contained three identifiable cate- 
gories, /w-l-j/, was examined further with a naive 
groiq) of Americans in Experiment 2. 

3. EXPERIMENT 2 

3.1 Method 

3.L1 Sukjects. As the original sul:ijects were no 
longer available for testing, nine new native 
English-speaking American subjects (3 males, 6 
females) participated in the study. Seven were 
graduate students; the other two were faculty 
members. All reported normal hearing in both 
ears. Two additional subjects were eliminated 
from the final sample after testing, when they 
indicated that they had been diagnosed as 



learning disabled in childhood. Both had 
phonemic categorization difficulties, having failed 
to consistently categorize and discriminate 
synthetic /ra/VIa/ in a separate but concurrently- 
run study. 

3.1.2 Stimuli and Procedures. The /w-j/ series 
from Experiment 1 was again employed. The 
procedure and testing conditions were identical to 
those of Experiment 1, except that the forced- 
choice identification test included three response 
alternatives (^,* *L,* VT) rather than two. 

3J2 Results 

3.2.1 Identification test. As illustrated in the left 
side of Figure 6, sui:doct8 consistently divided the 
continuum into three sharply-defined categories. 
Table 5 lists the means and standard deviations of 
the boundary location and slope values for both 
boundaries, computed from PROBIT analyses as 
in Experiment 1. Tliree of the 18 fitted ogives 
deviated significantly from the raw data, 
according to analyses, two on the /1-j/ boundary 
and a third on the /w-1/ boundary. In all cases, the 
ogive was the best fit obtainable, and the 
significant x^n were due to extremely steep 
categoiy boundary slopes. 

The location of /w-1/ and /l-j/ boundaries obtained 
in the three-choice identification task was 
compared with the /w-j/ boundaries obtained in 
the two-choice task of Experiment 1. A Groups 
(Americans-Exp. 2 vs. Americans-Exp. 1 vs. 
Japanese-Exp.l) x Comparison Pairs ANOVA 
comparing the /w-1/ boundary with the /w-j/ 
boundaries jnielded a significant main effect of 
Groups, F{2M) = 25.04, p < .001. Sheffe's tests 
showed that the /w-1/ boundary differed from both 
the American and Japanese /w-j/ boundaries in 
Experiment 1 (p < .01). In a separate ANOVA 
comparing the /1-j/ boundary with /w-yf boundaries 
from Experiment 1, there was again a significant 
main effect of Groups, F(2M) = 7.86, p = .001. 
Scheffe's tests indicated that the /1-j/ boundary 
again differed from the Americans-Exp. 1 /w-j/ 
boundary (p < .01). However, it did not differ from 
the Jr^panese /w-j/ boundary (p = .35). Thus, while 
the Experiment 1 discrimination results suggest 
that the Japanese had actually perceived three 
categories along the /w-j/ series, as do Americans, 
the latter result suggests that the Japanese 
assimilated the intermediate tokens to their /w/ 
category but as perceptibly poorer exemplars of 
that categoTy. 

Neither the /w-1/ nor the /1-j/ slope values 
differed from those found for either group in 
Experiment 1. 
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EXPERIMENT 2 
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Figure 6, Idtntification and discrimiiution functions for the 3-Gattgofy tests on the /w-j/ series witti Americans in 

Ejqperiment 2. 
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Table 5. Category boundary locations and slope values 
for Americans' three-choice identification of the Av-j/ 
series (Experiment 2). 





hrA/ 




n-y 




mean (SD) 


mean (SD) 


Boundary Location 


3.26 (0.84) 


7.20 


(0.59) 


Boundary Slope 


253 (139) 


3.17 


(1.17) 



3.2.2 Discrimination test. As can be seen in the 
right side of Figure 6, the discrimination function 
again showed two peaks of relatively accurate 
performance, which coincided with the two cate- 
gory boundaries revealed in the 3-dioice identifi- 
cation task. For comparison with Experiment 1, a 
Groups (Japanese-Exp. 1, Americans-Exp. 1, 
Americans-Exp. 2) x Comparison Pairs ANOVA 
was conducted. The Groups main effect was non- 
significant (p = .66), indicating no systematic dif- 
ferences among groups in overall discrimination 
performance. The significant Comparison Pairs ef- 
fect, F(6,144) 29.68, p < .001, revealed that there 



were two reliable peaks in discrimination. Finally, 
the Groups x Comparison Pairs interaction was 
significant, ^12,144) := 2.48, p < .01, due primar- 
ily to differences among the groups in discrimina- 
tion of Ae within-category Pairs (1-4, 3-6, 7-10). 
However, the locations of discrimination peaks did 
not differ among the three subject groups. 

3.3 Discussion. The results of Experiment 2 
confirm that the intermediate category suggested 
by the double peak in the Experiment 1 
discrimination functions was identified by 
Americans as /I/. As suggested earlier, this 
categorization is interpretable on the basis of the 
similarity between the acoustic properties of /I/ 
and those of the intermediate tokens in the /w-j/ 
series (see Figure 1). For intermediate tokens, Fl 
had a steady-state onset, followed by a moderately 
steep transition, like /V but unlike M in the /r-V 
series. They had F2 onsets around 1200-1400 Hz, 
with a shallow falling transition, again like /V in 
the /r-I/ series. Moreover, their F3 transitions 
were nearly flat or slightly falling, like that of /I/ 
in the /tA/ series, except for a slight dip in 
frequency just before readying the vowel steady- 
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state. In particular, the F3 onset frequency of 
these stimuli was not dose to the frequenor of F2, 
which is needed for good M perception* Given that 
Japanese does not employ an /V phoneme, this 
intermediate category may have been dis- 
criminated from both /w/ and f}/ as a category 
goodness distinction, most likely within the 
Japanese /w/ category. 

4. General Discussion 
The results of Experiment 1 revealed language* 
specific influences in the perception of English ap- 
proximant contrasts hy adult native speakers of 
American English and Japanese. Identification 
and discrimination performance were consistent 
with cross-language differences in both the 
phonemic status and the phonetic details of the 
three contrasts. Both language groups showed 
sharp category boundaries and high discrimina- 
tion peaks on the /w-j/ series, which represents a 
phonemic contrast in both languages. However, 
there were group differences in the location of the 
/w-j/ category boundary. The Japanese identified 
more items as /w/, consistent with cross-language 
phonetic differences in degree of lip-rounding 
during production of /w/. On the /w-r/ series, the 
Japanese showed a more gradual crossover in 
identification functions and less accurate between- 
category discrimination than the Americans. In 
addition, a marginal shift in boundary location 
and discrimination peak suggested that Japanese 
categorized more intermediate tokens as /w/ than 
Americans did. This pattern is ali;o consistent 
with cross-language differences in the phonetic 
realization of the /w-r/ contrast. Thus, while in 
abstract phonological terms fw/ vs. M is a distinc- 
tive contrast in Japanese, the phonetic differences 
across languages led to distinctly different pat- 
terns of perception of the synthetic /w-r/ stimuli. 
As for /r-1/, the Inexperienced Japanese showed 
much less consistent identification functions and 
markedly poorer discrimination than the 
Americans. However, there was no significant 
shift in boundary location relative to Americans, 
in keeping with earlier reports (MacKain et al., 
1981; Miyawaki et al., 1975). This group 
difference is compatible with the fisct that frA/ is a 
phonemic distinction only in English, and that 
neither segment is phonetically similar to the 
Japanese /r/. 

This pattern of cross-language differences sup- 
ports predictions based on the perceptual assimi- 
lation model proposed by Best and colleagues 
(Best, 1992; Best et al., 1988) to explain variations 
in the difficulty of discriminating nonnative 



segmental contrasts. Specifically, Japanese listen- 
ers were expected to assimilate the English /w-j/ 
contrast as a two category contrast The pattern of 
Japanese listeners' sharp category boundary and 
high discrimination performance on the /w-j/ 
series was consistent with this prediction. English 
/w-r/ was expected to be assimilated to Japanese 
as a contrast involving a category goodness differ- 
ence, with M most likely being assimilated as a 
exemplar of Japanese /w/. Japanese listen- 
ers' more gradually sloping identification function 
and lower discrimination poak for the /w-r/ series 
were compatible with this prediction. Finally, 
English /r-1/ was expected to be assimilated to a 
single category by Japanese, with both phones 
representing poor exemplars of either the 
Japanese /w/ or, less likrly, of their tapped /r/. 
Once again, the more poorly defined category 
boundaxy lower discrimination performance of 
the Japanese listeners were consistent with this 
prediction. 

The present study extended the model of 
perceptual assimilation from simple predictions 
about discriminability of nonnative segmental 
contrasts to two measures of how nonnative 
segments are actually categorized by listeners. 
The location of the category boundary differed 
between the two groups, consistent with the 
articulatory-phonetic (and acoustic-phonetic) 
differences between the American Engli^ and the 
Japanese /w-j/ contrast Specifically, the Japanese 
perceived more tokens as /w/ than the Americans, 
in keeping ^th observations that Japanese /w/ is 
more similar to ^ acoustically and articulatorily 
then is English M. The stimulus items in the /w-j/ 
series that were identified as /w/ by Japanese but 
as /i/ by Americans in Experiment 1 were just 
thosa items perceived as /I/-like by Americans 
wh(m they were given a 3-way choice {/w-l-j/) in 
Exp^eriment 2. Lrmguage-spedfic differences in the 
phonetic details of the phoneme contrast ^shared** 
by the two languages resulted in a divergence 
between language groups in the location but not 
the steepness of the /w-j/ category boundaries 
across Experiments 1 and 2, which supports the 
notion that the Japanese listeners assimilated the 
nonnative segments to the familiar categories of 
their native phonological system. This language- 
specific boundary shift extends Lisker & 
Abramson's (1970) classic findings on cross- 
language differences in the voice-onset-time 
boundary for stop consonants to a place-of- 
articulation distinction for approximants. 
Moreover, the cross-languiige differences in 
identification and discrimination of /w-r/ (and /r-1/) 



ERIC 



1 ^ 



Eflecis ofPkonological and PhontHc Faciors on Cross-Language Perception ofApproximants 



105 



are quite consistent with differences in the 
phonemic status and phonetic details of those 
contrMts with respect to the two languages. 

The results of this study are also relevant to 
Flege's account of cross-language differences in 
speech perception. According to his Speech 
Learning Model (1988, 1990) adult learners per- 
ceive phones of the L2 on the basis of their 
phonetic similarity* to native language (LI) cate- 
gories. Highly dissimilar phones (referred to as 
New phones) are initially difficult to categorize 
perceptually, but with L2 experience, learners 
form distinct L2 phonetic representations cf these 
categories, which leads to improvement in both 
their perception and production. Phones which are 
identical to or highly similar to native phones 
(Identical phones) are easily perceived even by be- 
ginning learners, because they %t* LI cate- 
gories. Phones which are similar to but not identi- 
cal mth LI categories ('Similar* phones) are the 
most problematic for L2 learners. They continue 
to classify Similar phones according to LI cate- 
gories even after considerable experience, which 
leads to continued ^accented* production and diffi- 
culties perceiving that the L2 phones differ from 
those of LI. Thu£, Flege's model assumes that L2 
phones are equated with LI phonemes in a di- 
chotomous, all-or-none fashion; i.e., they are either 
fully equated with an LI phone or fail to be 
equated to an L2 phone. By comparison, the per- 
ceptual assimilation model (Best, 1992) instead 
assumes that listeners can perceive variations in 
the goodness of fit of an L2 phone to an LI 
phoneme category. The latter assumption is com- 
patible with findings that listeners are sensitive to 
the category goodness of stimulus variations 
within a given native category (e.g., Grieser & 
Kuhl, 1989; Miller & Volaitis, 1989). Also note 
that Flege's model was developed to address 
perceived similarities between individual L2 
phones and individual LI phoneme categories, 
whereas the perceptual assimilation model was 
developed to address the perception of L2 
contrasts. 

If we extend the Flege model to perception of 
non-native contrasts between phones, the results 
of experiment 1 are partially consistent with that 
model. According to Flege's classification sdieme, 
English /j/ is Identical, /w/ is Similar, and /r/ and 
/V are New phones for Japanese learners of 
English. Both inexperienced and experienced (re: 
spoken English) Japanese would thus classify 
stimuli of the /w/-/j/ contrast according to two 
Japanese categories, resulting in good identifica- 
tion and discrimination. His model would also 



predict a shift in the category boundary (relative 
to Americans), reflecting differences between the 
Japanese and English /w/. The results of experi- 
ment 1 are consistent with both expectations. For 
the /r-1/ series, inexperienced Japanese would be 
expected to have considerable difficulty, but expe- 
rienced Japanese would show improved percep- 
tion, reflecting the establishment of new phonetic 
categories. This was indeed the case in 
Experiment 1. In addidon, the fact that the cate- 
gory boundary for experienced Japanese was not 
different from the Americans* supports the pre- 
diction that they had established new L2 cate- 
gories. However, predictions for the /w-r/ series 
are somewhat more difficult to generate from 
Flege's model. The model should predict good 
identification and discrimination of these stimuli 
by experienced Japanese, who shotild have formed 
a New L2 category for M to contrast with the 
Similar category of /w/. Their performance levels 
should therefore equal those of the Americans. 
However, it is less clear how inexperienced 
Japanese should perform with /w-r/. Although 
they would be predicted to identify /w/ well, and M 
poorly, their discrimination performance is more 
difficult to predict Should their performance be 
poor because they have difficulty with the M that 
has not yet been established as a New L2 cate- 
gory, or should their performance be moderately 
good because they perceive /w/ as Similar and rec- 
ognize that M is dtfferent from /w/? In either case, 
we might expect, nonetheless, that discrimination 
performance would be lower for inexperienced 
Japanese than for Americans or for Japanese who 
are more experienced with spoken English. The 
shift in discrimination peak for the experienced 
Japanese toward the location of the American 
boimdary in experiment 1 suggests that those sub- 
jects may indeed have established a New M cate- 
gory, which contrasts with the Similar /w/ cate- 
gory. Note, however, that the overall level of dis- 
crimination performance did not differ signifi- 
cantly among inexperienced Japanese, experi- 
enced Japanese and Americans, as would be 
predicted from Flege's model. 

Flege's model might also appear to address the 
existence of the intermediate category in the /w-j/ 
series, even for Japanese listeners, i.e., they may 
have begun to form a new /I/ category as a result 
of English experience. However, two observations 
are at odds with this possibility. First, there was 
no difference on that contrast between the 
Inexperienced Japanese, who had had very little 
experience with spoken American English at the 
time of testing, and the Experienced Japanese. 
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Both groups provided equally strong evidence of 
perceiving the intermediate category in the /w-j/ 
series; the intermediate category in the double- 
F'Oaked discrimination functions was no lees dear 
for the Inexperienced Japanese than for the 
Experienced Japanese, or in fact for the 
Americans. Second, if even the Inexperienced 
Japanese were truly developing a new phonetic 
category on the basis of their limited English 
exposure, then we would expect this /V category 
to emerge in their responses to the /r-1/ series as 
well. Such was not the case. 

Flege's notion that L2 experience may lead to 
the formation of new phonetic categories is not in- 
compatible with Best's perceptual assimilation 
model. The assumption that experience with spo- 
ken L2 may lead to a reorganiacation of perceptual 
assimilation of nonnative phones, in fact, moti- 
vated the comparison between the Japanese sub- 
groups differing in English ctmversation training 
and experience. The assimilation model assumes 
that listeners are sensitive to degrees of similarity 
and dissimilarity between the nonnative and na- 
tive phones. This is most obvious when there are 
category goodness differences in assimilation, or 
when the nonnativs phones are non-assimilable. 
Indeed, adult L2 learners should be expected to 
form new phonetic categories most readily for L2 
phones perceived as discrepant exemplars of a 
native category, i.e., for the non-prototypical 
member of a contrast that is assimilated as a cat- 
egory goodness difference from a native phoneme. 
If no discrepancies are perceived between the L2 
and LI phone — that is, for the L2 phone that is 
perceived as a good exemplar of the native 
phoneme— it should be quite difficult for the L2 
learner to form a new category. Conversely, if the 
L2 phone is so dissimilar from LI phonemes that 
it cannot readily be related to any LI cat^ory, we 
may expect the L2 learner to have some difficulty 
forming a new phonetic category, because a clear 
contrast between a i^pedfic familiar phoneme and 
an unfamiliar phoni> may be particularly informa- 
tive to the learner. 

The one imexpected finding — that listeners from 
both language groups apparently discriminated a 
third, intermediate phonetic category between the 
two endpoint categories of the /w-j/ series — ^is 
consistent with the above suggestion. Experiment 
2 with a new group of American listeners verified 
that this third category was highly identificble as 
/]/ (although it remains to be determined whether 
Japanese listeners at either level of English 
experience would reliably label those items as 
*L*). Although the Japanese language does not 



employ an /]/ phoneme, even the Inexperienced 
Japanese clearly distinguished a third phonetic 
categor:/ from the /w/ and ^, according to the two 
marked peaks in their /w-j/ discrimination 
function, whidi was virtually identical to the 
discrimination functions of the two groups of 
Americans. This observation, together with the 
Inexperienced Japanese listeners' better 
discrimination performance on /w-r/ than on /r-1/, 
suggests the possibility that adults' recognition of 
the phonetic properties of a nonnative segment 
might be aided by direct comparison between 
exemplars of that segment presented in context 
with exemplars of the most similar (in 
articulatory-phonetic cr acoustic-phonetic terms) 
native phoneme. That is, perceptual learning 
about th« novel L2 segment may benefit from 
contextual comparisons which exemplify 
differences between the native phoneme and the 
nonnative phone that is perceived as a poorer 
exemplar of that familiar eaiegory. In the present 
context, Japanese listeners' recognition of a third 
category in the /w-j/ series, which was identified 
as /V by the Americans in Experiment 2, 
apparently benefited from its contrast to the 
flanking categories of Japanese /w/ and ^, i.e., the 
intermediate, nonnative category constituted a 
noticeably poor fit to one or both of the familiar 
Japanese categories. While this observation is 
consistent with Flege's (1988; 1990) claim about 
the importance of similarity versus Newness" of 
nonnative phones to the degree of perceptual 
a^iottments to L2 learning, it is also compatible 
with the perceptual assimilation hypothesis that 
category goodness differences are relatively 
discriminable as a difference between the native 
category ^ideal* and less-good exemplars. Further 
research is obviously needed fo determine whether 
presenting a nonnative phoue in juxtaposition to 
the most similar native phoneme contrast may 
actually improve perception of the new category. 

In either event, the data presented here are 
generally consistent with the suggestion that lan- 
guage-specific attunement of phonetic perception 
may remain somewhat malleable even in adult- 
hood (see also Flege, 1988; MacKain et al., 1981; 
Pisoni et al., 1982; Strange & Dittmann, 1984; 
Tees & Worker, 1984; Worker & Tees, 1984). The 
subgroup of Japanese listeners who had had more 
intensive conversation experience with American 
English speakers showed greater similarities to 
the Americans than did the Inexperienced 
Japanese in their performance on all three stimu- 
lus series. Thus, English conversation experience 
may have shifted those Japanese listeners' catego- 
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rization and ditcrimination toward the phonemic 
and phonetic properties of the approximant 
contrasts employed in American English. Note, 
however, that the performance of the Experienced 
Japanese was not identical to the Americans*, in- 
stead falling intermediate between the latter 
group and the Inexperienced Japanese (see also 
Yamada & Tokhura, 1991). 

Further research is needed to determine which 
factors may influence adults' perceptual adjust- 
ments to the phonemic and phonetic properties of 
L2 segmental contrasts, and to what extent there 
may be limitations on such L2 influences in 
adulthood. It is important to recognize that we 
had no control over, or access to, the factors that 
led to the group differetices in English conversa- 
tion experience. For example, in our Japanese 
subgroups, level of English conversation experi- 
ence may have been affected by individual differ- 
ences in phonetic ability (recall the categorical /r-1/ 
performance of the Inexperienced Japanese sub- 
ject M. K.: MacKain et al., 1981), by differences in 
the necessity of speaking English, by differences 
in motivation to use English like a native,* and/or 
by differences in the nature of exposure to English 
(e.g., traditional classroom vs. immersion pro- 
gram), in addition to duration and intensity of 
exposure to spoken English. Another factor that 
appears to have strong impact on an adult's 
ability to perceive a given nonnative contrast is 
whether the individual had any substantive 
exposure during early childhood to languages 
using that contrast (e.g., Flege, 1988; Tees & 
Werker, 1984). 

Although we cannot verify that the Japanese 
subgroup difference we found was due to differ- 
ences in L2 experience in adulthood, rather than 
to earlier-occurring factors, several observations 
suggest the likelihood that the relevant experience 
with spoken L2 was limited to adulthood. Three of 
the Experienced Japanese had come to live in the 
U. S. as adults, the fourth at 19 years, all past the 
presumed ^critical period* for language-learning 
which ends at puberty. All had begun intensive 
English conversation training either after their 
arrival in the U. S. or less than a year before they 
left Japan. Moreover, while most Japanese are 
formally taught English in school beginning at age 
12 years or earlier, the instructors are typicaliy 
native Japanese rather than English speakers, 
and the emphasis is on reading/writing and not on 
speaking/hearing (Mochizuki, 1981; Yamada & 
Tokhura, 1991). Nonetheless, further research is 
needed to clarify the contribution of various fac- 
tors to subgroup differences in perception of L2 



contrasts, including studies of longitudinal 
changes within a given group of listeners. 
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Plausibility, Parsimony, and Theories of Speech* 



Alvin M, liberman 



According to a somewhat unconventional view, tpeedi iu managed by a specialization for 
language--a phonetic module — at the level of action and perception, There, the proceeies 
and primitives are specifically phonetic, not, as is more commonly assumed, generally 
motor and auditory. The less conventional view is nevertheless the more plausible because 
it (1) better illuminates the biological nature of the differenoe between spoken and written 
forms of language, and (2) provides the better account of how speech meets the specific 
requirement of phonological communication that the elements be commutable, as well as 
the general requirement of all communication systems that there be parity between sender 
and receiver. Also relevant to the argument of plausibility is the fact that, while the 
phonetic module is tmique to language, it is not without biological precedent, since it has 
important properties in common with such older (and better understood) specializations as 
stereopsis and sound localization. 



It is, for me, a happy privilege to be part of an 
occasion that honors Paul Bertelson, dear friend 
and valued colleague. As my contribution to the 
occasion, I offer a few reflections on a question I 
have often discusse' nth Paul: Is there a 
specialization for language at the precognitive 
level? Is there, in other words, a specifically 
linguistic mode of action and perception? Put in 
one form or another, this question goes to the 
heart of claims about the modular nature of 
linguistic processes. It arises wherever in 
language one happens to look, but it assumes 
what I take to be its most pointed manifestation 
at the level of phonetic structure. There lie two or 
three dozen consonants and vowels, familiar 
objects of a seemingly simple sort. Yet they are 
the elements of whidi all languages are made. 
Moreover, their proper use is a distinguishing 
mark of the human species and a principal 
component of its linguistic faculty. Accordingly, 
the question I raise about their management is a 
question about the biology of language. 

Together with some of my colleagues, including 
especially Ignatius Mattingly, I believe the answer 
to the question is yes — the biology of language 
does, indeed, incorporate a precognitive spedaliza* 
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tion for the production and perception of 
consonants and vowels, a specialization we hnve 
chosen to call a phonetic module. We tak«i this 
module to be an integral part of the larger 
specialization for language, adopting what Fodor 
(1983) would characterize as a vertical view in 
which the relevant structures and processes are 
seen as specific to the linguistic function they 
serve. The opposite view, which is more widely 
held, is that speech is to be accoujited for by the 
most general principles of motor activity and 
auditory perception; accordingly, this view is 
appropriately referred to as horizontal. 

My aim in this paper is to promote the less 
conventional vertical view, not by reference to the 
results of particular and putatively critical 
experiments, but rather by taking account, in very 
general form, of a few commonly neglected 
considerations that are relevant to its plausibility 
and parsimony. A fuller description of the vertical 
view, together with an account of the nature of its 
empirical support, is to be found elsewhere 
(Liberman & Mattingly, 1985; Liberman & 
Mattingly, 1989; Mattingly & Liberman, 1988). As 
will be seen there, this view comprehends both the 
production and perception of speech; indeed, it 
assumes an organic relation between the two. It 
happens, however, that the considerations I mean 
to offer in this paper are concerned primarily with 
perception, so I will bias the emphasis in that 
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directioa, as I do in the following brief account of 
the difference between the vertical view and its 
horizontal oppotite. 

The horizontal view varies in its particulars 
from one theorist to another, but the basic 
assumptions are much the same. Thus, the 
several proponents are in agreement that 
perception of speech is no different from 
perception of other sounds (Ades, 1977; Bregman, 
1991; Cole & Scott, 1974; Crowder & Morton, 
1969; Diehl & Kluender, 1989a; Fujisaki & 
Kawashima, 1970; Howell & Rosen, 1984; Kuhl, 
1981; Lane, 1965; lindblom, 1991; Miller, 1977; 
Oden & Massaro, 1978; Stevens, 1981). All such 
perception is supposed to depend on the same 
general processes of hearing, processes that 
occupy a common domain and evoke in a common 
sensory register a common set of auditory 
primitives, including, for example, pitdi, loudness, 
and timbre. Of coarse, the perceptual repre- 
sentations of a stop consonant and, say, a 
squeaking door must be different, but the 
difference is supposed to be only in the relative 
values that are assigned to the primitives they 
have in common; there are no specifically phonetic 
primitives. Thus, the primary perceptual 
representations of speech are taken to be 
generally auditoty, not specifically phonetia The* 
being so, proponents of the horizontal view are 
required to explain how, being independent of 
language, the auditory representations gain 
access to a system in which they ar^ specifically 
marked for linguistic significance and used for a 
specifically linguistic purpose. 

Some proponents explicitly meet this 
requirement by supposing that, given the auditory 
percepts, the listener elevates them to linguistic 
status by attadiing phonetic labels, fitting them to 
phonetic prototypes, or associating them with 
such cognitive units as distinctive features (Ades, 
1977; Crowder & Morton, 1969; Fujisaki & 
Kawashima, 1970; Pisoni, 1973; Rosen & Howell, 
1987; Stevens, 1975, 1989). Since these labels, 
prototypes, and features are neither acts nor 
percepts, Uiey deserve to be called ideas. But 
whatever they are called, they are the end 
products of a cognitive translation that converts 
auditory percepts into a form appropriate to 
language. Getting from speech signal to the 
primary level of language is, therefore, a two- 
stage process: evocation of an auditory percept in 
the first stage, followed by conversion to a 
phonetic representation in the second. In this 
important respect, the horizontal view implausibly 
makes perceiving speech no different in principle 



{rem perceiving Morse code or, for that matter, the 
lettctrs of the alphabet; in all cases, the perceiver 
must attribute linguistic significance to percepts 
that are not inherently linguistic (see LQ>erman, 
in presK, for further discussion). 

There ars at least two other assumptions of the 
horizontal view, but these are commonly left 
unsaid, ihou^ they are, to the vertical theorist, of 
great importance. One, which seems to be tacitly 
accepted, not as an assiimption but as background 
fact, is itxBi phonetic elements are sowds. The 
other, which is commonly unspoken because it 
must i^pear on this view to be irrelevant, is that 
the gestures and motor control processes of speech 
production are, like the processes of speech 
perception, independent of language. Presumably, 
language simply appropriated movements and 
motor mechanisms that are part of a general 
faculty for action, just as it appropriated for its 
own special purposes the general mechanisms of 
audition. It is, therefore, necessary for the 
speaker, just as it is for the listener, to make a 
cognitive translation between two very different 
kinds of representations, one lingiiistic, the other 
not. According to the horizontal view, then, it 
should not matter in this regard whether one 
produces language by speaking it, by operating a 
Morse-code key, or by wielding a pen. Putting this 
observation about production together with ths 
earlier one about perception, we see that the 
horizontal view must fail in both domains to 
provide a plausible basis for distinguishing the 
biologically primary processes of speech from their 
obviously secondary extensions. 

The vertical view is different at all points. Seen 
vertically, apprehending phonetic structures is 
managed by a distinct, language-specific system 
that has its own phonetic domain, its own pho- 
netic mode of signal processing, and its own pho- 
netic primitives. Perception of phonetic structure 
is therefore precognitive, which is to say immedi- 
ate; there is no translation from a nonphonetic 
(auditory) representation because there is no such 
representation. It is, of course, in precisely this re- 
spect that perception of speech differs, plausibly, 
from perception of Morse code or of scripts. 

There are two other assumptions of the vertical 
view that contrast starkly with its conventional 
counterpart One is that the elements i>f phonetic 
structure are gestures, not the sounds those ges- 
tures produce. These acts are, then, the ultimate 
constituents of language, the primitives that must 
be exchanged between speaker and listener if 
communication by language is to occur. The sec- 
ond assumption is that these gestures, as well as 
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the procesges that control them, are specifically 
phonetic, having evolved for phonological commu- 
nication and for nothing else. Unlike a Morse code 
operator or a writer, a speaker is directly using 
motor representations that are inherently linguis- 
tic. There is no need to connect a nonlinguistic act 
(pressing a key or writing an alphabetic character) 
to some linguistic unit of a cognitive sort. Nor is 
such a unit required, more generally, to serve as a 
common referent through which a nonlinguistic 
act and a correspondingly nonlinguistic (auditory) 
percept can be connected to each other. On the 
vertical view, the specifically phonetic gestures 
that are managed by the module in prodiiction are 
recovered by the module as the specifically pho- 
netic primitives of perception, thereliy completing 
the communicative link without cognitive inter- 
vention, while also making speech an integral part 
of language, not, as on the horizontal view, an 
artifactual adjunct 

Are there acoustic substitutes for speech? 

According to the horizontal view, speech 
percepts are supposed to be auditory in the same 
way that the percepts evoked by the letters of the 
alphabet are known to be visual. In the visual 
case, the only limit to the number and variety of 
optical shapes that can be made to serve as 
alphabetic characters is in the constraints 
imposed by the visual system, and they are few. 
Given the conventional view of speech, one would 
suppose that a similar situation would exist there. 
Of course, the auditory diannel is neither so wide 
nor so deep as the visual, but, still, the number of 
sounds that can be identified is very great, so one 
should expect that it would be possible, even easy, 
to find alternative acoustic vehicles. 

The foregoing implication of the horizontal view 
is exactly what my colleagues and I tacitly 
accepted when, in 1945, we were enlisted in an 
attempt to build a device that would convert print 
into intelligible sound and so serve as a reading 
madiine for the blind. We should, of course, have 
wanted a madiine that would make the print 
speak English, but there were at the time no such 
things as optical character readers, and, even if 
there had been, we should not have known how to 
synthesize speech from their outputs. However, 
we considered this to be of no great consequence, 
for we could quite easily make the print control 
the parameters of various nonspeech sounds, and 
so produce an acoustic cipher differing only in 
detail from the speedi to which the blind users 
were accustomed. Given our tacit assumptions 
about the nature of speech, we supposed that they 



would learn to connect these sounds to phonetic 
units, mudi as they had earlier done with the 
sounds of speedi. 

A detailed account of our unsuccessful attempts 
to substitute nonspeech sounds for the sounds of 
speech would not be enlightening here, for it 
would only make the point that, try as we might, 
we did not come anywhere near to succeeding. Of 
course, we could not then, and cannot now, expect 
to test all possible sounds, nor could we readily 
arrange for people to have with nonspeech tiie 
amount of experience they must have had with 
speech. Still, we were then, as we are now, 
convinced that nonspeech sounds simply won't do, 
not just because they failed the tests we put them 
to, but because they failed in ways that made it 
plain why we should never have expected them to 
succeed. The difficulty was not primarily that the 
sounds ware indiscriminable or unidentifiable, but 
rather that every arrangement we tried was 
defeated in one way or another by the variable of 
rate. Thus, we found that, as the rate of scan 
approached the lower bound of what would be 
even marginally acceptable in speech or in 
reading, performance (as measured by ability to 
learn a selected set of words) decreased 
appreciably. Worse yet, listeners lost the ability to 
identify the individual letter sounds and to 
apprehend their order, responding instead to some 
overall auditory pattern characteristic of the word. 
Thus, to the extent that the words could be 
learned at all, they had to be treated logo- 
phonically, as it were, with attention directed to 
the way the sound differed holistically from the 
sound for any other word. The tremendous 
advantage of the combinatorial principle that 
phonology exploits was therefore lost, and, given 
that a purely logographic system cannot really 
work very well even in reading (De Francis, 1989; 
Mattingly, 1991), one can imagine how vastly 
more unsuited it would be as a basis for speech 
perception. 

The final blow was dealt by our observation that 
when we ourselves undertook to master one of 
these nonspeech systems, we found little transfer 
of training across rates. Letters and words learned 
at one rate could not be recognized at other rates 
that were still within the range of what was 
reasonable if the madiine was to have any utility. 
Words tended not only to become hard-to-analyie 
wholes, but the phenomenal nature of the whole 
changed quite drastically from one rate to 
another. A user would have been required, 
therefore, to learn a difTerent set of associations 
for eveiy significantly different rate. 
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In hindiisht, i\; is apparent that if we had ever 
bothered to think about the requirements of 
phonolc^cal communication, and tiien measured 
these against the known properties of the ear, we 
should have realized, without any researdi at all, 
that an acoustic-auditory strategy of the kind 
suggested by the horizontal view was bound to 
fail. Th^ point is that phonological communication 
requires commutable, hence discrete and in- 
variant, r^nreeentations. But if such invariance is 
to exist in the auditory domain, as it must on the 
view that we had unthinkingly adopted, then 
rates i. ransmission that are normal in speech 
would seriously strain and sometimes overreach 
the temporal resolving power of the ear and also 
its ability to perceive the order of the segments 
(laberman. Cooper, & Studdert-Kennedy, 1968). 
(Speedi production would be equally problematic, 
since invariant and discrete auditory percepts 
would require correspondingly invariant and 
discrete gestures^ with the result that people could 
not really speak, thev could only spell). 

But we had to lea a the hard way, as it were, 
that nonspeech sounds — that is, sounds that do 
not approximate the results of linguistically 
significant gestures— cannot be efficient vehicles 
for language. It was, indeed, this paiufully- 
arrived-at conclusion that initially motivated 
Frank Cooper and mn to begin our speech 
research. Our aim, very simply, was to find out 
why the sounds of speedi, but no others, can meet 
the commutability and rate requirements of 
phonological communication. The answer our 
research brought us to seems to me now so 
plausible, not to say obvious, that I wonder we did 
not arrive at it earlier, simply by thinking about 
the matter. For what it comes to is that evolution 
did not ever confront the problems of 
commutability and rate, simply because it avoided 
the acoustic-auditory strategy (of the horizontal 
view) that would have given rise to them. What 
evolved was a brilliantly successful strategy that 
defined the invariant elements of phonetic 
structure not as sounds, but as gestures. The 
critically important advantage of this strategy was 
that, given gestures that can somehow be 
characterized as remote structures of motor 
control, and given a mode of action specifically 
adapted to matching these to the needs of 
^phonology, it was possible by overlapping and 
merging (that is, coariiailation) of the peripheral 
movements to adiieve the hi^ rates of production 
that characterize speedi comm\mication« 

As for perception, whidi was initially our single- 
minded concern, the advantage is that 



coarticulation effects parallel transmission of 
information about successive phonetic segments, 
and so relaxes the constraints on rate of per- 
ception that underlay the failure of our nonspeech 
reading madiines. But this gain has an obvious 
cost, for coarticulation creates a complex relation 
between signal and message, a specifically 
phonetic code that is opaque except as the 
sdentist or perceiving device can take account of 
the phonetically specific processes that produced 
it. Once researdi on speech had convinced us that 
this was so, we felt dballenged to explain, if only 
in the most general terms, how listeners manage. 
We r^ected the possibility that they break the 
code by some deliberate, cognitive process, 
preferring, instead, to suppose that they rely on a 
biologically coherent module specifically adapted 
to providing the articulatoiy key. But whatever 
the plausibility of this proposed solution, it was 
never plausible to suppose that perception of 
linguistic structure is so much controlled by 
general auditory processes that it can be achieved 
as well with sounds other than speech. That we 
nevertheless thought it was is testimony to the 
unquestioning fiuth we had in what was then, and 
is now, the received view. 

Whence comes the fit of perceptual f onn to 
phonological function? 

Given that the function of phonology is to use 
the combinatorial prindple to generate a large 
number of words, the units must, as already 
noted, be discrete and invariant, which is to say 
categ'^rieal, as they are seen from a linguistic 
point rji view. It is adaptive therefore that the 
unitfi be correspondingly categorical in immediate 
perception. Listeners would only be disconcerted 
by the sense, if it should be their sense, that a 
particular phonetic token, X, lay half way between 
X and Y, or that it really sounded like Z, except as 
it was reinterpreted so as to take account of the 
fact that it was followed by A. Fortunately, 
listeners do not have either sense: the much- 
investigated peaks of discriminability at the 
acoustic boundaries of the phonetic unit reflect 
category-produdng discontinuities in perception, 
and it is characteristic of phonetic perception that 
theiie categories remain stable across all context- 
conditioned variation in the stimulus. 

What, then, is the source of these stable 
perceptual categories? On the horizontal view, it 
must, of course, be in the properties of the 
auditory system. Accordingly, theorists of this 
persuasion take comfort in the experiments that 
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find categories in the responses of nonhuman 
animals to speech and in the responses of human 
listeners to acoustic noDspeech analogues Q3iehl & 
Walsh, 1989; Kiuender, 1991; Kluwder, Diehl, & 
Killeen, 1987; Huender, Diehl, & Wright, 1988; 
Kuhl & Miller, 1975; Massaro, 1987; Parker, 1988; 
Parker^ Diehl, & Kluender, 1986; Pastore, 1987; 
Pisoni, 1973; Pisoni, Carrell, & Cans, 1983). The 
opposite result is also found, much to the 
satisfaction of the vertical theorists, who must 
believe that this kind of categorical perception is 
specifically phonetic (Best, Morrongiello, & 
Kobson, 1981; Best, Studdert-Kennedy, Manuel, 
Rubin-Spitz, 1989; Mann & liberman, 1983; 
Ldberman, Isenberg, & Rakerd, 1981; Mattingly, 
liberman, Syrdal, & Halwes, 1971; Sinnott, 1976; 
Waters & Wilson, 1976). However, I do not mean 
here to offer a critical evaluation of the 
experimental evidence pro and con the one 
assumption or the other, but, rather, in keeping 
with the spirit of this paper, to argue that the 
horizontal (auditory) interpretation is simply 
implausible on its face. 

It is relevant, first, to take into account how 
very great is the variation in stimulus for any 
given perceptual category (Repp & Liberman, 
1987). For all phones, there is variation as a 
function of phonetic context, position in the 
syllable, and vocal-tract size. In some cases, there 
are changes depending on articulatory rate and 
stress. And, of course, there are the differences 
that exist across languages. Indeed, so gross is 
this stimulus variation^ and so numerous its 
sources, that it is impossible to estimate how very 
many alternative category boundaries the 
auditory system would need if the percepts were 
to be held constant, and implausible to suppose 
that these boundaries could exist in such 
numbers. Surely, they could not have been 
selected in the evolution of the auditory system 
just against the possibility that phonology would 
one day come along and find them useful. Yet, as 
properties of the auditory system, they serve no 
othnr imaginable purpose. Indeed, from an 
auditory standpoint, they would be dysfunctional, 
since they would necessarily distort the perception 
of nonspeech sounds. 

Even if one assumes, against all reason, that 
this numerous variety of boundaries does exist in 
the auditory system, is it plausible to suppose that 
coarticulatory maneuvers vary as they do with 
phonetic context and with rate just in order to 
produce sounds that matdi the way categories of 
the auditory system happen, independently of 



coarticulation, to adjust to variation in the 
acoustic stimulus? 

Moving, now, from implausibility to 
impossibility, I remark the fact that, as is well 
known, the articulation of eveiy phonetic unit has 
multiple acoustic consequences, and that listeners 
are more or less sensitive to all of them. So, if 
speakers had sosiehow managed to produce a 
second-formant transition to fit some auditory 
category, what then would they do about the 
third-formant transition and the burst? The 
answer has got to be nothing, since it is not 
possible to control these acoustic consequences 
independently. 

It is also true of these multiple sources of infor- 
mation that, no matter how numerous and acous- 
tically various they may be, they nevertheless 
evoke a unitary, categorical percept. This equiva- 
lence of the acoustically very different components 
of the speech signal is reflected in, and measured 
by, the trading relations, so-called, that speech re- 
searchers report (Diehl & Kluender, 1989; Fitch, 
Halwes, Erickson, & liberman, 1980; Repp, 1982). 
But one hardly needs experiments like those to 
make the point. For, surely, there is no doubt that 
there are multiple and acoustically very different 
sources of acoustic information for every phone, 
and it is common experience that the result is a 
unitary perceptual category, not a collage in which 
the several fragments represent the disparate au- 
ditory consequences of the different acoustic cues. 
Is it even conceivable that speakers produce these 
heterogeneous combinations of sounds by design, 
and that they do so because they once discovered 
that the auditory system just happens to cause 
them to evoke the same percept. It would, again, 
be dysfunctional if the auditory system did that, 
for it would effectively prevent the discrimination 
(or identification) of most ordinary acoustic 
events; indeed, it would tend to make all of them 
sound like speech. 

Nor can one reasonably suppose that such cate- 
gories as the auditory system apparently does 
have might somehow have served as starting 
points for the development of phonetic perception 
(Kuhl, 1981). Which contexts, rates, vocal-tract 
sizes, and languages might have been taken as the 
linguistic canon? And even if these auditory cate- 
gories are appropriate in some phonetic circum- 
stances, would they not be inappropriate, hence 
dysfunctional, in all others? Indeed, auditory cat- 
egories, to the extent that they exist, should make 
us the more convinced of the validity of the verti- 
cal view, since they require of the phonetic system 
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that it be fo independent as to ignore their poten- 
tially interfering representationt* 

Is it not far more plausible to suppose about all 
these cases that the variable and multiple sources 
of information in the speedi signal are simply the 
inevitable consequences of acts that are 
specifically adapted to a phonological function, 
and that percq»tion is managed 1^ a correspond- 
ing adaptation to those same acts and that same 
function? 

What is the place of speech in the 
biological scheme of things? 
If, as the horizontal view would have it, there is 
no specialization for language at the level of action 
and percepticm, ihen, as I have already implied, 
language must begin one step up, where, by a 
purely cognitive process, a select set of nonUnguis* 
tic representaticQS is given a phonetic cast and so 
made appropriate for whatever specialized lan- 
guage processing the theorist wishes to assume. 
The same conclusion follows if the theorist should, 
by a seemingly logical extension, embrace ''^e 
more broadly horizontal assumption that then ^ 
no specifically linguistic process at any level, that 
just as speech is merely one among many BxpreB- 
sions of dio general faralties of action and percep- 
tion, so does syntax fall out of a general faculty of 
cognition. On either version, however, it will be 
hard to provide a parsimonious answer to a fun- 
damental question about the biology of speech: 
how are the acts and percepts of speedi marked in 
evolution for linguistic significance, and so set 
apart firom all others? 

Perhaps the most explicit attempt to answer 
this question from a horizontal point of view has 
been made by Lindb!om (1991) who says that 
^languages make their selection of phonetic 
gesture inventories under the strong influence of 
motor and perceptual constraints that are 
language independent and in no way special to 
speech (the functional adaptation of phonetic 
gestures).* Then, referring to the unconventional 
assumption that there are specializations at the 
level of perception and action, he says, If so, why 
do inventories of vowels and consonants show 
evidence of being optimized with respect to motor 
and perceptual limitations that must be regarded 
as biologically general and not at all special to 
speaking and listening?* 

As a criticism of the vertical view, which is how 
it was intended, Lindblom's argument can be 
dismissed as irrelevant to the question that this 
view is designed to answer. That question is not 
whether language somehow evolved out of what 



was already there, for it could hardly have done 
otherwise, but, rather, what it was that evolved, 
lindblom's answer is that there was, at the 
precognitive level, no evolution of anything, only a 
selection firom among the possibilities offered by 
general faculties that were, and presumably still 
are, independent of language. Of course, that 
must have been exactly what happened in the 
development (d^ say, a cursive writing system, for 
surely the selection of its characters must have 
been strongly influenced by 'Wtor and perceptual 
constraints tiiat are language independent.* But 
such an observation, true though it is, enlii^tens 
us not at all about the evolution of language, for 
what developed in the case of cursive writing were 
artifacts, not the biologically primary units of the 
language that those artifacts are taken to 
represent. Obviously, the artifacts can have been 
marked for linguistic significance only by 
agreement, not by the processes of biological 
evolution. It is up to eadi user, then, to honor the 
agreement 1^ mastering, at a cognitive level, the 
wholly arbitrary connection between the selected 
characters and the primary units of the language. 
On lindblom's account, the same must be said of 
speech and the speaker-listener. For if speech 
production and perception are not distinctly 
linguistic, the primary units of language must, as 
earlier noted, be in the nature of ideas— i.e., the 
labels, prototypes, distinctive features, etc. — to 
which the nonlinguistic representations of speech 
become connected* Such ideas might have been a 
result of the inventiveness that large brains and 
cognitive power make possible, in which case, the 
biology of speech woidd be the biology of large 
brains and cognitive power. Or, alternatively, they 
might have become part of the genetic inheritance 
of human beings, in which case the biology of 
speech would be the biology of innate ideas. In 
neither case would there be a place for speech in 
the biology of language. 

According to the vertical view, the biology of 
speech embraces specifically phonetic structures 
and processes that are adapted to specific linguis- 
tic functions. What evolved, on this view, was a 
special mode of communication (the phonological 
mode), that serves a distinctly linguistic function 
(the generation of a large vocabulary by use of the 
combinatorial principle), and imposes phonology- 
specific requirements (among wUcfa are the rapid 
production and perception of commutable ele- 
ments). The primitives of this mode are corre- 
spondingly special, being specifically linguistic 
and so appropriate for their role in the larger spe- 
cialization for language, including, for example, 
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the syntactic component. On that basis, it seems 
plausible to suppose that the elements and pro- 
cesses of the phonological mode were selected ac- 
cording to their ability to meet its special re- 
quirements* On the side of action, I should think 
that an important factor was not ease of produc- 
tion as sudi, but rather the extent to which the 
gestures lent themselves to the coarticulatory ma- 
neuvers that effectively circumvent the con- 
straints on rate that would have been imposed 
had discrete gestures been produced seriatim. On 
the perceptual side, a decisive factor must have 
been the immense advantage conferred by a com- 
plex kind of parallel transmission that extends the 
limit on rate set the temporal resolving power 
of the ear. It would appear then that, so far from 
being driven to exploit the strengths of the general 
motor and auditory systems, as lindblom's com- 
ments imply, the evolution of speech must have 
been guided, rather, by the need to find ways 
around what must be seen, from a phonological 
point of view, as their weaknesses. It must also 
have been guided, even more generally, by the 
need to meet the requirement of parity by estab- 
lishing an identity between the communicative 
acts of the speaker and the communicative per- 
cepts of the listener. This it did by incorporating 
in the precognitive biology of speech the special 
mechanisms that allow articulatoiy gestures — the 
constituents of language that must be common to 
speaker and listener— to survive the rigors of the 
communicative exchange. 

It is also relevant to the plausibility of a theory 
of speech to expose, among its biological 
implications, the relation of speech to other forms 
of natural communication. On any theory, the gulf 
between speech and other systems must, of 
course, be seen to be very wide, though one would 
surely be inclined to look with favor on a theory 
that nevertheless managed some kind of bridge. It 
therefore counts against the horizontal view that 
it fails to do that. For if there is no precognitive 
specialization for speech, then, as has been noted 
several times already, speech must be matched to 
phonetic ideas. The horizontal theorists 
apparently find that consequence acceptable as it 
applies to human beings and their language. But 
would they not hesitate to extend it to the 
nonhuman case? Presumably, they would, given 
the abundant evidence that nonhuman 
communication is imderlain by specializations for 
producing and perceiving specifically communica- 
tive signals of one sort or another. Are we to 
suppose, then, that unlike the nonhuman animals, 
which communicate as they do because of the 



nature of their precognitive specializations, we 
humans speak because, having risen above that 
mean level, we take advantage of innate ideas and 
intelligence? The vertical view, on the other hand, 
permits us to see that we and the other creatures 
are all precognitively specialized for commu- 
nication; the important difference is that our 
specialization comprises a phonology and a 
syntax, while theirs does not. 

There remains the biologically relevant 
question: What more general phenomena are 
exemplified by the processes of speech? Here, the 
horizontal view might appear to have the 
advantage, since it takes speech production and 
perception to be not different from other forms of 
action and perception. Accordingly, speech 
processes are as general as those that manage all 
of auditory perception and all of motor activity. 
The vertical view, on the other hand, abjures this 
kind of generality, holding that speech processes 
are specific to the linguistic function they serve. 
Indeed, it is precisely on this score that the 
unconventional view has been criticized as 
unparsimonious. As I have already tried to show, 
however, it is just because of tlie assumption 
about special processes that the unconventional 
view is the more parsimonious, since assuming 
another precognitive specialization is presimiably 
less in need of Occam's razor than assuming a set 
of innate phonetic ideas. 

At all events, assuming a specialization for 
speech is no more unparsimonious than making 
the corresponding assumption for other systems 
that are biologically i^dapted to stimulus events 
and properties that are of great ecological 
significance to the species. Consider, for example^ 
echolocation in the bat, sotmd localization in the 
bam owl, song in the bi^d, or, indeed, stereopsis in 
the human. Like the speech specialization as 
characterized by the vertical view, each of these is 
to be imderstood only by reference to the special 
mechanisms by which it serves its special 
function. While each system is therefore different 
from every other, they have in common the 
properties that Fodor has identified as 
characteristic of the modules that he takes as the 
functional elements of the precognitive mind. 
Moreover, the specializations named above have 
in common with each other and with speech that 
they all belong to a class of modules called 'dosed' 
by Mattingly and me, and claimed by us to share 
the following properties (liberman & Mattingly, 
1989; Mattingly & Liberman, 1988). 

(1) The representations are heteromorphic. That 
is, the dimensions of the percept are in- 
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commensurate with the dimensionB of the stim- 
ulttt. Thus, in itereoicopic vision, the viewer 
peroeivet heteromorphic depth, not homomorphic 
disparity (doablinc of images). In speech, the 
listener perceives, heteromorphicaUy, a string of 
discrete consonants and vowels, not the 
continuously varying timbres (chirps, whistles, 
bleats, etc*) that constitute the homomorphic 
representations of the continuously changing 
fonnant tracks* 

(2) The modules preempt the stimulus infor- 
mation that is of interest to them, using it to form 
the heteromorphic percept, while leaving none for 
the homomorphic counterpart (Bentin tk Mann, 
1990; Uberman & Mattingly, 1989; Whalen & 
liberman, 1987)* Thus, over a range of Unocular 
disparities, the viewer perceives depth; disparity 
is not also seen* In a similar way, listeners 
perceive phonetic struciuren, not phonetic 
structures and also the homomorphic chirps and 
whistles that the components of the acoustic 
signal would otherwise represent. 

(3) The modules are hi^y plastic, which allows 
them to be calibrated and recalibrated by relevant 
environmental conditions that accumulate over 
time, or that diange, whether naturally or by 
design of an eiqperimenter (Khudsen, 1988). Thus, 
stereopsis adjusts at the precognitive level to the 
changes in binocular disparity that Oi :ur as the 
child's head grows bigger. The phonetic module is 
similarly calibrated over time according to the 
phonetic environment to whidi it is exposed. At all 
events, the plasticity of these modules is so great 
that they accommodate stimulus patterns that fall 
some distance beyond what is possible 
ecologically. Thus, viewers perceive depth with 
disparities far greater than could ever be provided 
by the distance between the eyes. Phonetic 
perception is possible with a wide variety of 
departures from the normal acoustic structure of 
speech, including even sine-wave analogs of the 
formant trades. 

(4) When the limit of plasticity is exceeded, 
preemptiveness fails, with the result that het- 
eromorphic and homomorphic representations are 
evoked simultaneously. Thus, in stereopsis, as the 
disparity is progressively increased, a point is 
reached at whidi the viewer sees heteromorphic 
depth but also homomorphic disparity. In speech, 
as the experimenter introduces a discordance or 
discontinuity between two parts of the signal, a 
point is reached at whidi the listener perceives 
the heteromorphic structure but also the chirps, 
whistles, or bleats that constitute the homo- 
morphic representation* As it occurs in speech. 



this phenomenon has come to be known as 'duplex 
perception* (Bentin t& Mann, 1990; Liberman, 
Isenberg, & Rakerd, 1981; Mann & Liberman, 
1983; Rand, 1974; Whalen & Liberman, 1987). 
Tht point to be made here is simply that duplex 
perception is not a freak phenomenon, limited to 
speedi, but is, rather, what happens to a closed 
module when, as a consequence of limits on its 
plasticity, it can no longer preempt the stimulus 
information* 

(5) In the case of stereopsis, it has been shown 
that, as the disparity ii increased over the range 
of duplex perception, the heteromorphic percept 
progressively diminishes while the homomorphic 
percept grows until, finally, only the homomorphic 
percept is represented (Richards, 1971)* (It is as if 
there were a conservation of stimulus information: 
some, or all, of the information goes to form the 
one percept, the remainder goes to the other, and 
vice versa* Is there, perhaps, some imaginable 
sense in whidi the perceptual 'sum' can be laid to 
remain constant?) Mattingly, Yi Xu, and I are 
currently testing the h3rpoUiesis that duplex 
perception in speech follows a course similar to 
that found in the duplex range of stereopsis* But 
whatever the outcome of this test, there is already 
considerable evidence for the condusion that the 
properties of the phonetic module are similar to 
those that diaracterize other biological spedal- 
izations for perception. 

In the domain of speech, there are, then, two 
quite different kinds of biological generality, one 
for each theory. The horizontal theory claims 
generality by assodating speech with processes 
that cut across a variety of perceptual, motor, and 
cognitive functions. The vertical view finds it in 
the integral relation of speech to langusk^e and in 
the resemblance of speedi to other spedalizations 
at the precognitive level. The question, then, is not 
which theory relates speech more generally to 
other aspects of biology but rather whidi kind of 
generality corresponds more closely to the true 
state of af&irs* 

The vertical view of speech— that the con- 
stituents are gestures, not sounds, and that these 
constituents are managed by a phonetic 
specialization — ^is apparently rejected by most 
students of speedi as implausible and unparsi- 
monious: implausible, because it flies in the fisoe of 
the common*sense observation that speech 
consists of sounds that fall on the ear and there- 
fore exdte the auditory system; unparsimonious, 
because it requires the assumption of a distinct 
and hitherto unadmowledged mode of action and 
perception. My aim in this paper has been to show 
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that the shoe is on the other foot. The general 
form of the argument is that the horizontal view is 
implausible because the nonlinguistic modalities 
of action and perception it relies on are manifestly 
ill suited to the special requirements of phono- 
logical communication; it is unparsimonious 
because it requires cognitive processes of one sort 
or another if the general auditoiy and motor units 
of speech are to be connected to language. The 
vertical view is designed to avoid theoe flaws. 
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The Relation of Speech to Reading and Writing* 



Alvin M. Libennan 



Thm diffimnc* in naturalnMs batwam 9p—ch and rMding^writing is an important fact for 
the psychology of language and the obvious point of departure for understanding the 
processes of literacy^ yet it cannot be accounted for by the conventional theoiy of speedi. 
Because this theory allows no linguistic specialization at the level of perception and action^ 
it necessarily implies that the primaiy represenutions of speech are just like those of 
reading/writing: neither is specifically linguistic, hence both must first be translated into 
linguistic form if they are to serve a linguistic fimction. Thus, the effect of the conventional 
theory is to put speech and reading/writing at the same cognitive remove firom language 
and so make them equally unnatural. 

A less conventional view shows the primaiy motor and perceptual representations of 
speech to be specifically phonetic, the automatic results of a precognitive specialization for 
phonological communication* Accordingly, these representations are naturally ^propriate 
for language, requiring no cognitive translation to make them so; in this important respect 
they differ from the representations of reading/writing. Understanding the source of this 
difference helps us to see what must be done if readers and writers are to exploit their 
natural language faculty; why reading and writing should be at least a little difficult for 
all; and why they might be very diffici^t for some. 



Theories of reading/writing and theories of 
speech typically have in common that neither 
takes proper account of an obvious fact about 
language that must, in any reckoning, be critically 
relevant to both: there is a vast difference in 
naturalness (hence ease of use) between its 
spoken and written forms. In my view, a theory of 
reading should begin with this fact, but only after 
a theory of speech has explained it. 

My aim, then, is to say how well the difference 
in naturalness is illuminated by each of two 
theories of speech— one conventional, the other 
less 80— and then, in that light, to weigh the 
contribution that each of these can make to an 
understanding of reading and writing and the 
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difficulties that attend them. More broadly, I aim 
to promote the notion that a theory of speech and 
a theory of reading/writing are inseparable, and 
that the validity of the one is measured, in no 
small part, by its fit to the other. 

WHAT DOES IT MEAN TO SAY THAT 
SPEECH IS MORE NATURAL? 

The difference in naturalness between the 
spoken and written forms of language is patent, so 
I run the risk of being tedious if I elaborate it 
here. Still, it is important for the argument I 
mean to make that we have explicitly in mind how 
variously the difference manifests itself. Let me, 
therefore, count the ways. 

(1) Speech is universal. Every community of 
human beings has a fully developed spoken 
language. Reading and writing, on the other hand, 
are relatively rare. Many, perhaps most, 
languages do not even have a written form, and 
when, as in modem times, a writing syatem is 
devised — ^uf^ually by missionaries — ^it does not 
readily come into common use. 
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(2) Speech ia older in the histoiy of our ipecioB. 
Indeed, it ii presumably as old as we are, having 
emerged with ua aa peih^pc the most in4>ortant of 
our species-typieal characteristics. Writing 
systems, on the other hand, are developments of 
the last few thousand years. 

(3) Speech is earlier in the history of the 
individual; reading^mting oome later, if at alL 

(4) Speech miist, of course, be learned, but it 
need not be taught For learning to speak, the 
necessary and sufficient conditions are but two: 
membership in the human race and exposure to a 
mother tongue. Indeed, given that these two 
conditions are met, there is scarcely any way that 
the development of speech can be prevented. 
Thus, learning to speak is a precognitive process, 
much like learning to perceive visual depth and 
distance or the location of sound. In contrast, 
reading and writing require to be tau^t, though, 
given the right ability, motivation, and 
opportunity, some will infer the relation of script 
to language and thus teach themselves. But, 
however learned, reading/writing is an intellectual 
achievement in a way that learning to speak is 
not. 

(5) There are brain mechanisms that evolved 
with language and that are, accordingly, largely 
dedicated to its processes. Reading and writing 
presumably engage at least some of these 
mechanisms, but they must also exploit others 
that evolved to serve nonlinguistic functions. 
There is no spedalizat m for reading/writing as 
sudx. 

(6) Spoken language has tiie critically important 
property of 'openness': unlike nonhuman systems 
of communication, speech is capable of expressing 
and conveying an indefinitely numerous variety of 
messages. A script can share this property, but 
only to the extent that it somehow transcribes its 
spoken-language base. Having no independent 
existence, a proper (open) script is narrowly 
constrained by the nature of its spoken-language 
roots and by the mental resources on whidi they 
draw. Still, within these constraints, scripts are 
more variable than speedi. 

One dimension of variation is the level at which 
the message is represented, though the range of 
that variation is, in fact, much narrower than the 
variety of possible written forms would suggest. 
Thus, as DeFrands (1989) convincingly argues, 
any script that communicates meanings or ideas 
dinctiy, as in ideograms, for example, is doomed 
to arrive at a dead end. Ideographic scripts cannot 
be open — that is, they cannot generate novel 
messages-— and the ntmiber of messages they can 



convey is never more than the inventory of one-to- 
one associations between (holistically different) 
signals and distinctly different meanings that 
human beings can master. Indeed, it is a 
distinguishing diaracteristic of language, and a 
necessary condition of its openness, that it 
communicatee meanings indirectly, via specifically 
linguistic structures and processes, including, 
nontrivially, those of the phonological component. 
Not surprisingly, scripts must follow suit; in the 
matter of language, as mth so many other natural 
processes, it is hard to is^rove on nature. 

Constraints of a different kind apply at the 
lower levels. Thus, the acoustic signal, as 
represented visually by a spectrogram, for 
example, cannot serve as a basis for a script; while 
spectrograms can be puzzled out by experts, they, 
along with other visual representaticms, cannot be 
read fluooitly. The reason is not primarily that the 
relevant parts of the signal are insufficiently 
visible; it is, rather, that, owing to the nature of 
speech, and especially to the coarticulation that is 
central to it, the relation between acoustic signal 
and message is complex in ways that defeat 
whatever cognitive processes the "reader* brings to 
bear. Narrow phonetic transcriptions are easier to 
read, but there is still more context-, rate-, and 
speaker-conditioned variation than the eye is 
comfortable with. In any case, no extant script of- 
fers language at a narrow phonetic level. To be 
usable, scripts must, apparently, be pitdied at the 
more abstract phonological and morphophonologi- 
cal levels. That being so, and given that reading- 
writing require conscious awareness of the units 
r^resented by the script, we can infer that people 
can become conscious of phonemes and morpho- 
phonemes. We can also infer about these units 
that, standing above so much of the acoustic and 
phonetic variability, they correspond approxi- 
mately to the invariant forms in which words are 
presumably stored in the speaker's lexicon. A 
script that captures this invariance is surely off to 
a good start. At all events, some scripts (e.g., 
Finnish, Serfoo-Croatib^) do approximate to purely 
phonological renditions of the language, while 
others depart from a phonological base in the di- 
rection of morphology. Thus, English script is 
rather highly morphophonological, Chinese even 
more so. But, as DeFrands (1989; see also Wang, 
1981) makes abundantly clear, all these scripts, 
including even the Chinese, are significantly 
phonological, and, in his view, they would fail if 
they were not; the variation is simply in the de- 
gree to which some of the morphology is also 
represented. 
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Scripts also vary somewhat, as speech does not, 
in the size of the linguistic segments they take as 
their elements, but here, too, Uie dioice is quite 
constrained. Surely, it would not do to make a 
unit of the script equal to a phoneme and a half, a 
third of a syllable, or some arbitrary stretdi— say 
100 milliseconds— of the speech stream* Still, 
scripts can and do take as their irreducible units 
either phonemes or syllables, so in this respect, 
too, they are more diverse than speedi. 

(7) All of the foregoing differences are, of course, 
merely reflections of one underlying 
circumstance— namely, that speech is a product of 
biological evolution, while writing systems are 
artifacts. Indeed, an alphabet^-the writing system 
that is of most immediate concern to us — ^is a 
triumph of applied biology, part discovery, part 
invention. The discovery — surely one of the most 
momemtous of all time — ^was that words do not 
differ from each other holistically» but rather by 
the particular arrangement of a small inventory of 
the meaningless units they comprise. 
The invention was simply the notion that if each 
of these units were to be represented by a 
distinctive optical shape, then everyone could read 
and write, provided he knew the language and 
was conscious of the internal phonological 
structure of its words. 

HOW IS THE DIFFERENCE IN 
NATURALNESS TO BE UNDERSTOOD? 

Having seen in how far speech is more natural 
than reading/writing, we should look first for a 
simple explanation, one that is to be seen in the 
surface appearance of the two processes. But 
when we search tiiere, we are led to conclude, in 
deHance of the most obvious facts, that the 
advantage must lie with reading/writing, not with 
speech. Thus, it is the eye, not the ear, that is tlie 
better receptor; the hand, not the tongue, that is 
the more versatile effector; the print, not the 
sound, that offers the better signal-to^noise ratio; 
and the discrete alphabetic characters, not the 
nearly continuous and elaborately context- 
conditioned acoustic signal, that offers the more 
straightforward relation to the language. To 
resolve this seeming paradox and understand the 
issue more clearly, we shall have to look more 
deeply into the biology of speech. To that end, I 
turn to two views of speech to see what eadi has 
to offer. 

The conventional view of speech as a basis 
for understanding the difference in 
naturalness. The first assumption of the 



conventional view is so much taken for granted 
that it is rarely made explicit It is, very simply, 
that the phonetic elements are defined as sounds. 
This is not merely to say the obvious, which is 
that speedi is conveyed by an acoustic medium, 
but rather to suppose, in a phrase made famous by 
Marshall McLuhan, that the medium is the 
message. 

The second assumption, which concerns the 
production of these sounds, is also usually 
unspoken, not just because it is taken for granted, 
though it surely is, but also because it is 
apparently not thought by conventional theorists 
to be even relevant But, whatever the reason, one 
finds among the conventional claims none which 
implies the existence of a phonetic mode of 
action — that is, a mode adapted to phonetic 
purposes and no other. One therefore infers that 
the conventional view must hold (by defaiilt, as it 
were) that no such mode exists. Put affirmatively, 
the conventional assumption is that speech is 
produced by motor processes and movements that 
are ind^>endent of language. 

The third assumption concerns the perception of 
speech sounds, and, unlike the first two, is made 
explicitly and at great length (Cole & Scott, 1974; 
Crowder & Morton, 1969; Diehl & Kluender, 1989; 
Fvuisaki & Kawashima, 1970; Kuhl, 1981; Miller, 
1977; Oden & Massaro, 1978; Stevens, 1975). In 
its simplest form, it is that perception of speedi is 
not different from perception of other sounds; all 
are governed by the same general processes of the 
auditory system. Thus, language simply accepts 
representations made available to it by perceptual 
processes that are generally auditory, not 
speciflcally linguistic. So, just as language 
presimiably recruits ordinary motor processes for 
its own purposes, so, too, does it recruit the 
ordinary processes of auditory perception; at the 
level of perception, as well as action, there is, on 
the conventional view, no specialization for 
language. 

The fourth assumption is required by the second 
and third. For if the acts and percepts of speech 
are not, by their nature, spedfiodly phonetic, they 
must necessarily be made so, and that can be done 
only by a process of cognitive translation. 
Presumably, that is why conventional theorists 
say about speech perception that after the listener 
has apprehended the auditory representation he 
must elevate it to linguistic status by attaching a 
phonetic label (Crowder & Morton, 1969; FtJuisaki 
Sc Kawashima, 1970; Pisoni, 1973), fitting it to a 
phonetic prototype (Massaro, 1987; Oden & 
Massaro, 1978), or associating it with some other 
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linguistically significant entity, such as a 
'distinctive feature* (Stevens, 1975). 

I note, parenthetically, that this conventional 
way of th'"^"g about speech is heir to two related 
traditions in the psydiology of perception. One, 
which traces its origins to Aristode's enumeration 
of the five senses, requires of a perceptual mode 
that it have an end organ specifically devoted to 
its interests. Thus, ears yield an auditory mode; 
eyes, a visual mode; the nose, as olfactory mode; 
and so on. Lacking an end organ of its very own, 
speedi cannot, therefore, be a mode. In diat case, 
phonetic percepts cannot be the Tnrmediate objects 
of perception; they can only be perceived 
secondarily, as the result of a cognitive association 
between a primary auditory representation 
appropriate to the acoustic stimulus that excites 
the ear (and hence the auditory mode) and, on the 
other hand, some cognitive form of a linguistic 
unit. Such an assumption is, of course^ perfectly 
consistent with another tradition in psydiology, 
one that goes back at least to the beginning of the 
18th century, where it is claimed in Berkeley's 
•New Theory of ^^sion* (1709) that depth (which 
cannot be projected directly onto a two* 
dimensional retina) is perceived by associating 
sensations of muscular strain (caused by the 
conv^ence of the eyes as they fixate olgects at 
various distances) with the expmmoB of distance. 
In the conventional view of speech, as in 
Berkeley's assumption about visual depth, 
apprehending the event or property is a matter of 
pertieiving one thing and calling it something else. 

Some of my colleagues and I have long argued 
that the conventional assumptions fail to account 
for the important facts about speech. Here, 
however, my concern is only with the extent to 
which they enlighten us about the relation of 
spoken language to its written derivative. That 
the conventional view enlightens us not at all 
becomes apparent when one sees that, in 
contradiction of all the differences I earlier 
enumerated, it leads to the conclusion that speech 
and reading/writing must be equally natural. To 
see how comfortably the conventional view sits 
with an (erroneous) assumption that speech and 
reading/writing are psychologically equivalent, 
one need only reconsider the four assumptions of 
that view, substituting, where appropriate, 
'optical' for 'acoustic' or Sdsnial' for 'auditory.' 

One sees then, that, just as the phonetic 
elements of speech are, by the first of the 
conventional assumptions, defined as sounds, the 
elements of a writing system can only be defined 
as optical shapes. As for the second assumption— 



viz., that speech production is managed by motor 
processes of the most general sort — ^we must 
suppose that this is exactly true for writing; by no 
stretch of the imagination can it be supposed that 
the writer^s movements are the output of an action 
mode that is specifically linguistic. The third 
assumption of the conventional view of speedx also 
finds its parallel in reading/writing, for, surely, 
the percepts evoked by the optical diaracters are 
ordinarily visual in the same way that the 
percepts evoked by the sounds of speech are 
supposed to be ordhiarily auditory. Thus, at the 
level of action and perception, there is in 
reading/writing, as there is assumed to be in 
speedi, no specifically linguistic mode. For speech, 
that is only an assumption — and, as I think, a 
very wrong one— but for reading/writing it is an 
incontrovertible fact; the acts and percepts of 
reading/writing did not evolve as part of the 
specialization for language, hence they cannot 
belong to a natural linguistic mode. 

The consequence of all this is that the fourth of 
the conventional assumptions about speech is, in 
fact, necessary for reading/writing and applies 
perfectly to it: like the ordinary, nonlinguistic 
auditory and motor representations according to 
conventione! view of speech, the correspondingly 
ordinary visual and motor representations of 
readiog/writing must somehow be made relevant 
to language, and that can only be done by a 
cognitive process; the reader/writer simply has to 
learn that certain shapes refer to units of the 
language and that others do not 

It is this last assumption that most clearly re- 
veals the flaw that makes the conventional view 
useless as a basis for understanding the most im- 
portant difference between speech and read- 
in£/writing— namely, that the evolution of the one 
is biological, the other cultural. To appreciate the 
nature of this shortcoming, we must first consider 
how either mode of language transmission meets a 
requirement that is imposed on every 
communication system, whatever its nature and 
the course of its development This requirement, 
which is commonly ignored in arguments about 
the nature of speech, is that the parties to the 
message exchange must be bound by a common 
understanding about which signals, or which 
aspects of which signals, have communicative 
significance; only then can communication 
succeed. Mattingly and I have called this the 
requirement for "parit/ (Liberman & Mattingly, 
1985; liberman Mattingly, 1989; Mattingly & 
Liberman, 1988). One asks, then, what is entailed 
by parity as the system develops in the species 
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and as it is realized in the nonnal communicative 
act 

In the development of writing systems, the 
answer is simple and beyond dispute: parity was 
established by agreement. Thus, all who use an 
alphabet are parties to a compact that prescribes 
just which optical shapes are to be taken as 
S3rmbol8 for which phonological units, the 
association of the one with the other having been 
determined arbitrarily. Indeed, this is what it 
means to say that writing systems are artifacts, 
and that the child's learning the linguistic 
significance of the diaracters of the script is a 
cognitive activity. 

Unfortunately for the validity of the 
conventional assumptions, they require that the 
same story be told about the development of parity 
in speech. For if the acts and percepts of speech 
BT^ as the conventional assumption would have 
it, ordinarily motor and ordinarily auditoiy, one 
must ask how, why, when, and by whom they 
were invested with lingistic significance. Where is 
it written that the gesture and percept we know 
as [b] should count for language, but that a 
clapping of the hands should not? Is there 
somewhere a commandment that says. Thou shalt 
not commit [b] except when it is thy clear 
intention to communicate? Or are we to assume, 
just as absurdly, that [b] was incorporated into 
the language by agreement? It is hard to see how 
the conventional view of speech can be made to 
provide a basis for understanding the all- 
important difference in evolutionary status 
between speech and reading/writing. 

The problem is the worse confounded when we 
take account of both sides of the normal com- 
municative act For, on the conventional view the 
speaker deals in representations of a generally 
motor sort and the listener in representations of a 
generally auditory sort. What is it, then, that 
these two representations have in common, except 
that neither has anything to do with language? 
One must thus suppose for speedi, as for writing 
and reading, that there is something like a 
phonetic idea — a cognitive representation of some 
kind — ^to connect these representations to each 
other and to language, and so to make 
communication possible. 

Thus it is that at every biological or 
psychological turn the conventional view of speech 
make reading and vrriting the equivalents of 
speech perception and production. Since these 
processes are plainly not equivalent, the 
conventional view of speech can hardly be the 



starting point for an account of reading and 
writing. 

Hie unconventional view of speech as a 
basis for understanding the difference in 
naturalness. The first assumption of the 
unconventional view is that the units of speech 
are defined as gestures, not as the sounds that 
those gestures produce. (For recent accounts of the 
unconventional view, see: Liberman & Mattingly, 
1985; liberman & Mattingly, 1989; Mattingly & 
Liberman, 1988; Mattingly & Liberman, 1990). 
The rationale for this assumption is to be 
understood by taking account of the function of 
the phonological component of the grammar and 
of the requirements it imposes. As for the function 
of phonology, it is, of course, to form words by 
combining and permuting a few dozen 
meaningless segments, and so to make possible a 
lexicon tens of thoiJisands of times larger than 
could ever have been achieved if, as in all natural 
but nonhuman commxinication systems, each 
^word' were conveyed by a signal that was 
holistically different from all others. But phonol- 
ogy can serve this critically important function 
only if its elements are commutable; and if they 
are to be commutable, they must be discrete and 
invariant. 

A related requirement has to do with rate, for if 
all utterances are to be formed by variously 
stringing together an exiguous set of signal 
elements, then, inevitably, the strings must run to 
great lengths. It is essential, therefore, if these 
strings are to be organized into words and 
sentences, that they be produced and perceived at 
reasonable speed. But if the auditory percepts of 
the conventional view are to be discrete and 
invariant, the sounds and gestures must be 
discrete and invariant, too. Such sounds and 
gestures are possible, of course, but only at the 
expense of rate. Thus one could not, on the 
conventional view, say l^ag,' but only [b ] [a] [g ], 
and to say [b ] [a] [g ] is not to speak but to spell. 
Of course, if speech were like that, then everyone 
who could speak or perceive a word would know 
exactly how to write and read it, provided only 
that he had managed the trivial task of 
memorizing the letter-to-sound correspondences. 
The problem is that there would be no language 
worth writing or reading. 

There seems, indeed, no way to solve the rate 
problem and still somehow preserve the acoustic- 
auditory strategy of the conventional view. It 
would not have helped, for example, if Nature had 
abandoned the vocal tract and equipped her 
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hxnnan creaturefl with acoustic devices adapted to 
producing a rapid sequence of sounds— a drumfire 
or tattoo— for that strategy would have de feated 
the ear. The point is that q>eed& proceeds at rates 
that transmit up to 15 or even 20 phonemes per 
second, but if eadi phoneme were represented by 
a discrete soimd, then rates that hi^ would 
seriously strain and sometimes overreach the 
ability of the ear to resolve the individual sounds 
and to divine their order. 

According to the unconventional view. Nature 
solved the problem by avoiding the acoustic- 
auditory strategy that would have created it The 
alternative she chose was to define the phonetic 
elements as gestures, as the first assumption of 
the unconventicmal view proposes. Thus, [b] is a 
closing at the hps, (hi an opening at the glottis, [p] 
a combination of lip closing and glottis opening, 
and so forth. In fisct, the gestures are far more 
complex than this, for a gesture usually comprises 
movements of several articulators* and these 
movements aie exquisitely context*conditioned. 
Given sudi complications, I must wait on others to 
discover how best to characterize these gestures 
and how to derive the articulately movements 
from them. But while Tm waiting, I can be 
reasonably sure that the unconventional view 
heads the theoretical enterprise in the right 
direction, for it permits coarticulation. That is, it 
permits the speaker to overlap gestures that are 
produced by different organs— for example, the 
lips and the tongue in [ba]— and to merge gestures 
that are produced by different parts of the same 
organ — for example, the tip and body of the 
tongue, as in [da] — and so to achieve the high 
rates that are common. 

But the gestures that are coarticulated, and the 
means for controlling them, were not lying 
conveniently to hand, just waiting to be 
appropriated by language, which brings us to the 
second assumption of the unconventional view: 
the gestures of speech and their controls are 
specifically phonetic, having been adapted for 
language and for nothing else. As for the gestures 
themselves, they are distinct as a class from those 
movements of the same organs that are used for 
such nonlinguistic purposes as swallowing, 
moving food around in the mouth, licking the lips, 
and so on. Presumably, they were selected in the 
evolution of speech in large part because of the 
ease with which they lent themselves to being 
coarticulated. But the control and coordination of 
these gestures is specific to speech, too. For 
coarticulation mutt walk a fine line, being 



constrained on either side by the special demands 
of phonological communication. Thus, coartic- 
ulation must produce enouj^ overlap and merging 
to permit the high rates of phonetic segment 
production that do, in fact, occur, while yet 
preserving the details of phonetic structure. 

The third assumption of the unconventional 
view is that, just as there is a specialization for 
the production of phonetic structures, so, too, is 
there a q>ecialization for their perception. Indeed, 
the two are but complementary aspects of the 
same specialization, one for deriving the 
artieulatory movements from thd (abstract) 
specification of the gestures, the other for 
processing the acoustic signals so as to recover the 
coarticulated gestures that are its distal cause. 
The rationale for this assumption about 
perception arises out of the consequences of the 
fact that coarticulation folds information about 
several gestures into a single piece of sound, 
thereby conveying the information in parallel. 
This is of critical importance for language because 
it relaxes by a large factor the constraint on rate 
of phonetiC'Segment perception that is set by the 
temporal resolving power of the ear. But this gain 
has a price, for coarticulation produces a complex 
and singularly linguistic relation between acoustic 
signal and the phonetic message it conveys. As is 
well known, the signal for each particular 
phonetic element is vastly different in different 
contexts, and there is no direct correspondence in 
segmentation between signal and phonetic 
structure. It is to manage this language-specific 
relation between signal and appropriate percept 
that the specialisation for speech perception is 
adapted. Support for the hypothesis that there is 
such a specialized speech mode of perception is to 
be found elsewhere. (See references given at the 
beginning of this section.) What is important for 
our present purposes is only that, according to this 
hypothesis, the percepts evoked by the sounds of 
speech are immediately and specifically phonetic. 
There is no need, as there is on the conventional 
view, for a cognitive translation from an initial 
auditory representation, simply because there is 
no initial auditoiy representation. 

Now one can see plainly the difference between 
speech and reading/writing. In reading, to take 
the one case, the primary perceptual repre- 
sentations are, as we have seen, inherently visual, 
not linguistic Thus, these representations are, at 
best, arbitrary syinbols for the natural units of 
language, hence unsuited to any natural lan g u age 
process until and unless they have been 
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translated into linguistic form. On the other hand, 
the representations that are evoked by the sounds 
of speech ure immediately linguistic in kind, 
having been made so by the automatic processes 
of the phonetic module. Accordingly, they are, by 
their very nature, perfectly suited for the further 
automatic and natural processing that the larger 
specialization for language provides. 

As for parity and its development in evolution 
and in the diild, it is, on the unconventional view, 
built into the very bones of the system. For what 
evolved, on this view, was a specifically phonetic 
process, together with representations that were 
thus categorically set apart from all others and 
reserved for language. The unconventional view 
also allows us to see, as the link between sender 
and receiver, the specifically phonetic gestures 
that serve as the common coin for the conduct of 
their linguistic business. There is no need to 
establish parity by means of (innate) phonetic 
ideas — e*g., labels, prototypes, distinctive 
features — ^to which the several nonlinguistic 
representations must be cognitively associated. 

HOW CAN READINGAVRITING BE 

MADE TO EXPLOIT THE MORE 
NATURAL PROCESSES OF SPEECH? 

The conventional view of speech provides no 
basis for asking this question, since there exists, 
on this view, no difference in naturalness. It is 
perhaps for this reason that the (probably) most 
widely held theory of reading in the United States 
explicitly takes as its premise that reading and 
writing are, or at least can be, as natural and easy 
as speech (Goodman & Goodman, 1979). According 
to this theory, called Vhole language,' reading 
and writing prove to be difiBcult only because 
teachers burden children with what the theorists 
call 'l>ite-size abstract ^unks of language such as 
words, syllables, and phonemes' (Goodman, 1986). 
If teachers were to teach children to read and 
write the way they were (presumably) taught to 
speak, then there would be no problem. Other 
theorists simply ignore the primacy of speech as 
they describe a reading process in which purely 
visual representations are sufficient to take the 
reader from print to meaning, thus implying a 
ViouaF language that is som^ow parallel to a 
language best described as 'auditory' (see, for 
example, Massaro & Schmuller, 1975; F. Smith, 
1971). 

On the imoonventional view, however, language 
is neither auditory nor visual. If it seems to be 



auditory, that is only because the appropriate 
stimulus is commonly acoustic (pace Aristotle). 
But optical stimuli will, under some conditions, 
evoke equally convincing phonetic percepts, 
provided (and this is a critical proviso) they 
specify the same articulatory movements (hence, 
phonetic gestures) that the sounds of speech 
evoke. This so-called *McGurk effect* works 
powerfully when the stimuli are the natural 
movements of the articulatory apparatus, but not 
when they are the arbitrary letters of the 
alphabet. Thus, language is a mode, largely 
independent of end organs, that comprises 
structures and processes specifically adapted to 
language, hence easy to use for linguistic 
purposes. Therefore, the seemingly sensible 
strategy for the reader is to get into that mode, for 
once there, he is home free; everything else that 
needs to be done by way of linguistic processing is 
done for him automatically by virtue of his 
natural language capacity. As for where the 
reader should enter the language mode, one 
supposes that earlier is better, and that the 
phonological component of the mode is early 
enough. Certainly, making contact with the 
phonology has several important advantages: it 
makes available to the reader a generative scheme 
that comprehends all the words of the language, 
those that died yesterday, those that live today, 
and those that mil be bom tomorrow; it also 
establishes clear and stable representations in a 
semantic world full of vague and labile meanings; 
and, not least, it provides the natural grist for the 
syntactic mill— tii&t is, the phonological repre- 
sentations that are used by the working memory 
as it organizes words into sentences. 

The thoroughly visual way to read, described 
earlier, is the obvious alternative, doing 
everything that natural language does without 
ever touching its structures and processes. But 
surely that must be a hard way to read, if, indeed, 
it is even possible, since it requires the reader to 
invent new and cognitively taxing processes just 
in order to deal with representations that are not 
specialized for language and for which he has no 
natural bent 

WHAT OBSTACLE BLOCKS THE 
NATURAL PATH? 
As we have seen, the conventional view allows 
two equivalent representations of language— one 
auditory, the other visual— hence two equally 
natural paths that language processes might 
follow. In that case, such obstacles as there might 
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be could be no greater for the visual mode; indeed, 
accepting the considerations I mentioned earlier, 
we should have to suppose that visual 
representations would offer tiie easier route. 

Ibe unconventional view, on the other hand, 
permits one to see just what it is that the would- 
be reader and writer (but not the speaker/listener) 
must leam, and why the learning might be at 
least a little difficult. The point is that, given the 
specialization for speech, anyone who wants to 
speak a word is not required to know how it is 
spelled; indeed, he does not even hav^ to know 
that it has a spelling. He has only to think of the 
word; the speech specialization spells it for him, 
automatically selecting and coordinating the 
appropriate gestures. In an analogous way, the 
listener need not consciously parse the sound so as 
to identify its constituent phonological elements. 
Again, he relies on the phonetic specialization to 
do all the hard work; he has only to listen. 
Because the speech specialization is a module, its 
processes are automatic and insulated from 
consciousness. There are, therefore, no cognitively 
formed associations that would make one aware of 
the units being associated. Of course, the 
phonological representations, as distinguished 
from the processes, are not so insulated; they are 
available to consciousness — indeed, if they were 
not, alphabetic scripts would not woik— but there 
is nothing in the ordinary use of language that 
requires the speaker/listener to put his attention 
on them. The consequence is that eiperienoe with 
speech is normally not sufficient to make one 
consciously aware of the phonological structure of 
its words, yet it is exactly this awareness that is 
required of all who would ei\joy the advantages of 
an alphabetic scheme for reading and writing. 

Developing an awareness of phonological 
structure, and hence an understanding of the 
alphabetic principle, is made the more difficult by 
the coarticulation that i£j central to the function of 
the phonetic specialization. Though such 
coarticulation has the crucial advantage of 
allowing speech production and perception to 
proceed at reasonable rates, it has the 
disadvantage from the would-be reader/writer's 
point of view that it destroys any simple 
correspondence between the acoustic segments 
and the phonological segments they convey. Thus, 
in a word like %ag,' coarticulation folds three 
phonological segments into one seamless stretch of 
sound in which information about the several 
phonological segments is thoroughly overlapped. 
Accordingly, it avails the reader little to be able to 



ident'iy the letters, or even to know their sounds. 
What he must know, if the script is to make sense, 
is that r word like "bag" has three pieces of 
phonolof / eiven though it has only one piece of 
sound, rhm is now much evidence (1) that 
prelite? .^te and illiterate people Garge and small) 
lade s^ich phonological awareness; (2) that the 
amount of awareness they do have predicts their 
success in learning to resid, and (3) that tea chin g 
phonological awareness mskes success in reading 
more likely. (For a summary, see, for example, I. 
Y liberman & A. M. Liberman, 1990). 

WHY SHOULD THE OBSTACXE LOOM 
ESPECIALLY LARGE FOR SOME? 

Taking the conventional view of speedi seriously 
makes it hard to avoid the assumption that the 
trouble with the dyslexic must be in the visual 
Mywb&BL It is, therefore, not in the least surprising 
to find that by far the largest number of theories 
about dyslexia do, in fact, put the problem there. 
Thus, some believe that the trouble with dyslezics 
is that they cannot control their eye movements 
(Pavlides, 1981), or that they have problems with 
vergence (Stein, Riddell, & Fowler, 1989) or that 
they see letters upside down or wrong side to 
(Orton, 1937), or that their peripheral vision is 
better than it should be (Geiger & Lettvin, 1989), 
and so on. 

The unconventional view of speech directs one's 
attention, not to the visual system and the various 
problems that might afflict it, but rather to the 
specialization for language and the reasons why 
the alphabetic: principle is not self-evident As we 
have seen, this view suggests that phonological 
awareness, which is necessary for application of 
the alphabetic principle, does not come for free 
with mastery of the language. As for dyslexics— 
that is, those who find it particularly hard to 
achieve that awareness— the unconventional view 
of speech suggests that the problem might well 
arise out of a malfunction of the phonological 
specialization, a malfunction sufficient to cause 
the phonological representations to be less robust 
than normal. Such representations would 
presumably be just that much harder to become 
aware of. While it is dif^cult to test that 
hypothesis directly, it is possible to look for 
support in the other consequet^^ds that a weak 
phonological faculty should have. Hxus, one would 
expect that dyslexics would show such other 
symptoms as greater-than-normal difficulty in 
holding and manipulating verbal (but not 
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nonverbal) materials in working memory, in 
naming objects (that is, in finding the proper 
phonological representation), in perceiving speech 
(but not nonspeech) in. noise, and in m a n ag ing 
diflScult articulations. There is some evidence that 
dyslezics do show such symptoms. (For a 
summary, see: I. Liberman, Shankweiler, & A. 
liberman, 1985). 

WHAT ARE THE IMPLICATIONS FOR A 
THEORY OF SPEECH? 

Those who investigate the perception and 
production of spe^ have been little concerned to 
explain how th > processes differ so fundamen- 
tally in ratut.:Jnes8 from those of reading and 
writing ?drhap8 this is because the difference is 
80 obvious as to be taken for granted and so to 
escape scientific examination. Or perhaps the 
speech researchers lb ^ ve that explaining the 
difference is the bir « .s of those who study 
reading and wTitixi:g ' Jiy case, neglect of the 
I'lfference mii^t be juetiiiable if it were possible 
for a theor^ of speech to have no relevant 
implication^ ^ut a theory of speech does 
inevitably have such implications, and, as has 
been sho^ ihe implications of the conventional 
theory run counter to the obvious facts. My 
concern in this paper has been to show that, as a 
consequence, the conventional theory is of little 
help to those who would understand reading and 
writing. Now I would suggest that, for exactly the 
same reason, the theory offers little help to those 
who would understand speech, for if the theory 
fails to offer a reasonable account of a most 
fundamental fact about language, then we should 
conclude that there is something profoundly 
wrong with it 

The unconventional theory of speech described 
in this paper was developed to account for speech, 
not for the difference between its processes and 
those of reading and writing. That it nevertheless 
shows promise of also serving the latter purpose 
may well be taken as one more reason for 
believing it 
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Linguistic Awareness and Orthographic Form* 

Ignatiias G. Mattirt t 



INTRODUCTION: THE TAXONOMY OF 
WRITING SYSTEMS 

To impose dome pattern on the vast array of 
writing systems, present and past,i several 
investigatorB have proposed typologies of writing 
(Gelb, 196S; Hill, 1967; Sampson, 1985; 
DeFrands, 1989; see DeFrands for a review). 
While typology for its own sake may seem a 
dubious goal, these proposals bring to notice 
certain interesting questions. 

Consider first the problem posed by logograms. 
It is generally recognized that the signs found in 
writing fall into two broad categories: logographic 
and phonographic. Logograms stand for words, or 
more predsely, morphemes. Thus, in Sumerian 
writing, there is a logogram that stands for the 
morpheme ti, ^arrow.' Phonographic signs stand for 
something phonological: syllables or phonemic 
segments. Thus, in Old Persian, there is a sign for 
the syllable da, and in Greek alphabetic writing, a 
sign for the vowel a. This distinction suggests that 
writing systems might be classified according to 
whether they are logographic or phonographic. 
But the attempt to impose such a classification is 
embarrassed by the fact that while the many 
systems in the West Semitic tradition are indeed 
essentially phonographic and have no logograms, 
writing systems of all other traditions use both 
logograms and phonograms. There have been no 
purely logographic systems: phonographic signs 
are found in all traditions. 

In these drcumstances, Gelb sets up a hybrid 
category Srord-syllabic,* in which he includes 
Sumerian, Egyptian (whose phonographic signs he 
takes to be syllabic^), and Chinese. Other 
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orthographic tazonomists aUow a ¥rriting system 
to belong to two different categories. Thus for Hill, 
Egyptian is both ^^onemic* and Morphemic* and 
for Sampson, Japanese is both phonographic* and 
logographic* DeFrancis, recognizing that 
logograms are neither necessary nor suffident for 
an orthography, more sendbly treats logography 
as an optional accompaniment to various 
phonographic categories. But the question of 
interest is why logograms should play only this 
secondary role, why there have been no pure 
logographies. 

A second problem arises in sorting out the 
phonographic categories. Here one might recog- 
nize, with DeFrands, systems like Sumerian or 
Linear B, in whidi the phonographic signs stands 
for syllables; systems like Egyptian or Phoenidan, 
in whidi they stand for consonants; and systems 
like Greek or English, in which they stand for 
both consonants and vowels (plene systems). 

The distinction between consonantal and plene 
systems, however, proves to be less than rigid. In 
Egyptian, the letters for j, w, and 7 are used to 
write it Q and a, respectively, in foreign names 
(Gelb, 1963). Phoenidan, indeed, is a strictly 
consonantal, but the other ^consonantal* systems 
deriving from it all have some convention for 
transcribing vowels when necessary. For example, 
in Aramaic, the letters yodh, waw, and he (or 
aleph) were used to write final i, and a, 
respectively, and to render vowels in foreign 
names (Cross & Freedman, 1952). In Masoretic 
Hebrew, Arabic, and various Indie systems, 
vowels are r^ularly indicated by diacritic marks 
on consonant letters. And, of course, the first 
clearly plene system, the Greek alphabet, is a 
development from the Phoenidan consonantal 
system. The taxonomist thus has to dedde where 
to draw the line between essentially consonantal 
systems, hybrid systems, and undoubted plene 
systems. Perhaps the wisest course is the one 
followed by Sampson: simply to classify all these 
systems as ""segmental.* 
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Syllabic systems, in contrast, are clearly a 
separate categoiy and present no problem to the 
taxonomist Diere is no writing system that must 
be regarded as a hybrid between a iyllafaic and a 
segmental system. Syllabic systems show no 
tenden^r to analyze syllables into segments. What 
is found, rather, is that when analysis becomes 
necessary^ complex syllables are analyzed into 
simpler syllables. Thus, neither the 
Mesopotamian nor the Mayan syllabaries had 
signs for all possible C1V1C2 syllables in their 
respective languages. Instead, such syllables were 
written in Mesopotamian as if they were CiVi + 
V1C2 (Driver, 1976) and in Mayan as if they wt re 
CiVi + CiVi(Kelley, 1976)). Similarly, Greek: 
CiC2Vi...syllables were written in Linear B as 
C1V1+ Q2V1 +...(Ventris & Chadwick, 1973). Nor, 
despite suggestions to the contrary by Grelb and 
DeFrands, has a syllabic qrstem ever developed 
into a segmental system, or conversely*^ It cannot 
be excluded that the Egsnptians may, as DeFrands 
says (tollowing Ray, 1986), have gotten the idea of 
writing from the Sumerians. But there is certainly 
no reas(m to believe that they borrowed the idea of 
syllabic writing from the Sumerians and then 
adapted it to consonantal writing, in the way that 
the Greeks may be said to have borrowed the idea 
of consonantal writing from the Phoenicians and 
adapted it to plene writing. The various 
orthographic traditions are remarkably self- 
consistent in this matter. The Mesopotanuan, 
Chinese, Cretan and Mayan traditions began and 
remained syllabic; the Egyptian and West Semitic 
traditions began and rramined segmental. 

If the main purpose here were to arrive at a 
taxonomy of writing systems, the conclusion 
would have to be that there are two primary 
categories: syllabic and segmental. Either of these 
may or may not be accompanied by logograms. 
Transcription of vowels in s^mental systems is a 
matter of degree, with Phoenician at one end of 
the scale and Greek at the other. The interesting 
questicm, however, particularly given the degree of 
overlap or hybridization that is found between 
logographic and phonographic categories, and 
between consonantal and pUne categories, is why 
the syllabic and segmental categories have 
remained so distinct 

In an attempt to answer the questions just 
posed, it is necessary to consider why an 
orthography can make reading and writing 
possible, what constraints there are on the form of 
orthographies, how orthographies could have been 
invented, and what happens when orthographies 
are transmitted from one culture to another. 



WHY READING AND WRITING ARE 
POSSIBLE^ 

When a listener has just heard an utterance in a 
language he knows, he has available for a brief 
time not only his understanding of the semantic 
and pragmatic content of the utterance (the 
speaker's messageX but also a mental repre- 
sentation of its linguistic structure. The basis for 
this daim is that a linguist, by analyzing the 
intuitions of informants about utterances in their 
native language (sudi as that two utterances are 
or are not the sanM word, or that a certain word is 
the subject of a sentence), can formulate a 
coherent grammar, consistent with grammars 
&at would be formulated by other linguists 
working with other informants on the same 
language. This holds true even if, as is ^ically 
the case for a language with no writing system, 
the informants are quite unaware of the linguistic 
units into whidi utterances in their lan guage can 
be analyzed. Because the informants' intuitions 
are apparently valid, they must be based on 
linguistic representations of some kind. 

While linguists are not in total agreement about 
the nature of the linguistic representation of an 
utterance, it seems reasonably clear that sudi a 
representation must include the S3mtactic 
structure, the selection of lexical items and their 
component morphemes, the phonological struc- 
ture, and the phonetic structure. The linguist's 
syntactic diagrams and phonological and phonetic 
transcriptions are formal reconstructions of 
different levels of the representation. Ihese levels 
are not independent of one another. Syntax 
constrains lexical choice, lexical choice determines 
morphology and phonology, syntax and phonology 
determine phonetic structure. The representation 
thus has extensive inherent redundanqr. 

The linguistic representation is strictly 
structural rather than procedural. The listener 
has no access to the many intermediate steps he 
must presumably go through in the course of 
parsing the utterance, so that these steps are 
not represented. Acoustic details sudi as formant 
trajectories are not part of the linguistic 
representation, simply because the listener 
does not perceive them as such, but only the 
phonetic events they reflect. Other aspects of the 
utterance, such as individual voice quality, 
speaking rate, and loudness, whidi the listener 
can hear, must be presumed to be excluded 
because they are not linguistic at all and never 
serve to maik a linguistic difference between two 
utterances. 
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Access must be distinguished from awareness. 
All nomal laziguage users, it has been claimed, 
have access to the contents of linguistic represen* 
tations. Ihis means that they have a potential 
ability to introspect and report on significant de- 
tails of the representation, and to reganl it as a 
structure of phrases, words, and segments, not 
that they can actually do so. The representation is 
a complicated affair, and a person who is not 
linguistically aware* can no more be expected to 
notice its characteristic units and structure than 
an electronically naive person can be expected to 
appreciate the units and structure of a circuit dia* 
gram (Mattingly, 1972). Linguistic awareness 
must in large part be acquired. The principal 
stimulus for linguistic awareness in modem cul- 
tures is literacy (Morais, Gary, Alegria, & 
Bertelson, 1979). Unlike illiterate adults or prelit- 
erate children, those who have learned to read can 
readily report on and manipulate at least those 
units of the linguistic representations of spoken 
utterances to which units of the orthography cor- 
respond (Read, Zhang, Nie, & Ding, 1986). 
However, there must certainly be other sources of 
linguistic awareness: Long before writing was 
known, poets composed verse in meters requiring 
strict attention to subtle phonological v^letails. 

It is not agreed how linguistic representations 
are created. On one view, they are a byproduct of 
the cognitive processes by which utterances are 
analyzed. Linguistic information, recovered step 
by step from the auditory image of the input 
signal, is temporarily represented in memory 
until, at a later stage, the speaker^s message can 
be computed (Baddeley, 1986). The difficulty with 
this view is that, as has been noted, the language 
user seems to have no access to the supposedly 
cognitive analytic steps that must precede the 
formation of the representation or to the 
subsequent steps by which the message is derived 
from Uiis representation. An alternative view is 
that the representation, as well as the message 
itself, is not a byproduct but a true output of a 
specialized, low-level processor (the 'language 
module*) whose internal operations, being 
inaccessible to cognition, have no cognitive 
byproducts (Fodor, 1983). This view implies that 
the linguistic representation must have some 
biological function other than communication, for 
which the message alone would suffice. What this 
function mi^t be is unclear (but see Mattingly, 

1991, for some speculation;. . 

So far, the cognitive linguistic representation 
has been considered just as the product of the 
perception of utterances. But such representations 



are produced in the course of other modes of 
linguistic processing as well. Thus, a linguistic 
representation is formed in the production of an 
utterance, so that the speaker knows what it is he 
has just said. And when one rehearses an 
utterance in order to keep it in mind verbatim, 
what presumably happens is that the linguistic 
processor uses a decaying linguistic repre- 
sentation to construct a fresh version of the 
representation, and incidentally, of the message. 
This seeming defiance of entropy is possible for 
linguistic representations (as it may not be for 
mental representations in general) because of 
their high inherent redundancy. 

Consideration of rehearsal also shows that the 
linguistic representation can be an input to as 
well as an output from the linguistic processor. 
Even more significantly, for the present purposes, 
a representation not originally produced by pri- 
mary processes of perception or production can be 
such an input. An introspective, linguistically 
aware person can readily compose a ""synthetic** 
linguistic representation according to some arbi- 
trary criterion: the first five words he can think of 
that begin with /b/, for example. This is obviously 
a very partial representation: just a sequence of 
phonological forms drawn from the lexicon, with- 
out explicit phonetics or syntax. But if this se- 
quence is rehearsed, the phonetic level, together 
with whatever syntactic structure or traces of 
meaning may be accidentally implicit in the se- 
quence, will be computed, just as if the sequence 
were what remained of a natural representation 
resulting from an earlier act of production, per- 
ception, or rehearsal. All that is required for a 
synthetic representation to serve as input for 
computing a natural one is that it contain enough 
information so that the rest of the structure of the 
utterance is more or less determined. 

These various considerations suggest how it is 
that one linguistically aware language user can 
communicate with another, not by means of 
speech, but by means of synthetic representations, 
provided a way of transcribing such 
representations, that is, an orthography, is 
available. The writer speaks some utterance (at 
least to himself), creating a linguistic repre- 
sentation. The orthography enables him to 
transcribe this representation in some very partial 
fashion. From this transcription, the reader 
constructs a partial, synthetic linguistic repre- 
sentation. Such a representation is enough to 
enablo the reader^s linguistic processor to compute 
a complete, natural representation, as well as the 
writei^s intended message. 
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If we cami>are what happens between writer and 
reader with what happens between speaker and 
hearer, it can be seen that the difference is much 
morethanmerely a matter of sensory modality. In 
speedi perception, there is a natural and unique 
set of "signs*— the acoustic events that the human 
vocal tract can produoe-^d they are already in a 
form suitable for immediate linguistic processing 
(Liberman, this volume). Only the ou^ut of this 
processing is a linguistic represen t a t ion. The input 
speedi signal is in no sense a partial linguistic 
representati(m* but rather a complete representa- 
tion of a very Cerent kind. Moreover, the specifi- 
cation of the complex relation between the phonet> 
ically significant events in the signal and the units 
of the linguistic representation is acquired pre* 
cognitively (Liberman & Mattingly, 1991); it does 
not have to be learned. Indeed, as has been re- 
marked, the hearer has no access to the acoustic 
events, and may have little or no awareness €t the 
imits of the linguistic representation. In reading, 
on the other hand, there is no one. natural set of 
input symbols. Linguistic processing must there- 
fore be preceded by a stage having no counterpart 
in speedi perception: a cognitive translation from 
the orthographic signs to the units of the synthetic 
linguistic representation. The beginning reader 
must therefore deliberately master the mapping 
between the signs and the units, and for this he 
must have an awareness of the appropriate as- 
pects of the linguistic r^resentation. 

CONSTRAINTS ON ORTHOGRAPHIC 
FORM 

What psydiological factors constrain the form of 
an orthography? Gelb (1963) makes a useful 
distinction between ^outer form"— the shape of the 
visible symbols and their arrangement in a textr— 
and *inner form* — the nature of the 
correspondence of the symbols to linguistic units. 
Beyond the trivial requirement that the symbols 
be visually discriminable. Uiere appear to be no 
particular psychological constraints on outer form. 
The shapes of the signs in the writing systems of 
the world and the way they are arranged are 
extremely various, and such limitations as exist 
are to be accounted for not by cognitive or 
linguistic factors but by practical ones, sudi as the 
nature of the writing materials available and 
what patterns are easily written by hand, or by 
esthetic ones, such as the beauty of particular 
stroke patterns. This vadiety is possible because, 
as has just been seen, a cognitive translation is 
required for reading and writing in any event 



This price having been paid, outer form can vary 
almost without limit. 

Inner form, on the other hand, is highly 
constrained. In the first place, the orthography 
must coirespond to the linguistic representation, 
because there is no other cognitive path to 
linguistic processes. This is the reason that 
proposals to treat spectrographic displays of 
^)eedi as. in effect, an orthography the deaf could 
learn to read (Potter. Kopp. & Kopp. 1966) are not 
likely to succeed. On the one hand, the reader of 
spectrograms cannot process the visually- 
presented spectral information as a listener can 
process the same information in the auditorially- 
presented and biologically-privileged speech 
signal. On the o&er hand, the spectrogram reader 
has no natural cognitive access to raw spectral 
events, and. a fortiori, no awareness of them. 
Therefore, even if he could somehow synthesize a 
cognitive spectral representation from the visible 
one. there is no reason to believe it could be an 
input to linguistic processes. All he can do is to 
apply his cognitive knowledge of acoustic 
phonetics to the task of inferring the linguistic 
representation from the spectrogram. Because the 
relation between spectral patterns and even the 
most concrete level of this representation, the 
phonetic level, is extremely complex, and a great 
deal of extraneous information is present, 
^reading* spectrograms is a slow and unreliable 
process. Analogous observations, obviously, could 
be made with respect to other records of physical 
activity in whidi linguistic information is implicit, 
such as the speech waveform or traces of 
articulatory movements. What has to be 
transcribed, then, is some level or levels of the 
linguistic representation itself. 

However, certain levels of the linguistic 
representation are seldom or never transcribed in 
traditional orthographies. For example, syntactic 
structure is never transcribed. The few features of 
orthography that might be considered syntactic, 
such as punctuation and sentence-initial 
capitalization, are more reasonably regarded as 
transcriptions of prosodic elements. Why is syntax 
thus avoided? It is not just that tree diagrams are 
cumbersome to draw and nested brad^ets difficult 
to keep track of. but that the syntactic structure 
alone would be insuffident to specify a particular 
sentence: Each possible phrase marker is shared 
by an indefinitely large number of sentences. It 
would therefore be necessary that a syntactic 
orthography also transcribe in some way the 
particular lexical choices. But if this is to be done, 
the phrase-marker itself becomes redundant. 
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because (barrinsr some well-known types of 
ttnictural ambigiiity, such as those discussed by 
Chomsky, 1967) the words, and the order in which 
they occur, are themselves sufficient to specify 
syntactic structure. 

Again, someone who supposed that speedi and 
writing converged at the lowest conceivable level, 
given the difference of modality, mi^t expect that 
the most ^dent form of writing would be a nar- 
row phonetic transcription (see Edfeldt, 1960). 
This transcription would correspond to the ou^ut 
of the phonological component of the grammar, 
presumably the level of the linguistic representa- 
tion closest to the speech signal itself. Owing to 
contextual variation, higher-level units such as 
phonemes, syllables, morphemes, or words are not 
consistently transcribed or e3q[>licitly demarcated 
in such a transcription. But, in contrast to the 
syntactic orthography just considered, more than 
enough linguistic information to specify the lin- 
guistic representation would nevertheless be im- 
plicit Why is such an orthography not found? A 
partial answer is that because, as has been sug- 
gested, writing and speech are not, in &ct, so sim- 
ply related, there is no particular advantage to a 
low-level, phonetically veridical representation. 
Moreover, it seems more difficult to attain aware- 
ness of phonetic details insofar as they are pre- 
dictable. Once the language-learner is able to rep- 
resent words phonemically, the phonetic level 
seems to sink below awareness. But as will be 
seen, there is a still more fundamental reason why 
a narrow phonetic transcription would be imprac- 
tical. 

It is important to distinguish between the 
linguistic unit used for the actoal processing of an 
utterance by writer and reader, and the linguistic 
units to which the various graphemic units 
correspond. Elementary graphemic units 
correspond to phonemes (English letters or 
digraphs), syllables (Japanese kana^), or mor- 
phemes (simple Chinese characters). These are 
usually organized into complex units that have 
been called 'frames" (Wang, 1981). A spelled word 
in English, a complex Chinese character, a 
grouping of Eg3rptian hieroglyphics are examples. 
Frames are usually demarcated by spaces in 
modem writing, but other demarcative symbols 
have been used. Sometimes the frame is implicit: 
The structure of the frame itself may be sufficient 
to demarcate it from a4]itcent frames, as in 
Japanese, where a kai^i logogram or logograms is 
regularly followed by kana syllable signs 
specifying affixes. Some orthographies, such as 
those early alphabetic orthographies in which 



there is no demarcative information of any kind, 
have no frames larger than their elementary 
signs. Frames c^en correspond to linguistic 
words, but not always: In Chinese and Sumerian, 
they correspond to morphemes. 

By ^mit of transcription" is meant the linguistic 
unit that the writer actually transcribes and the 
reader cognitively translates to form the synthetic 
linguistic representation. One might expect that 
the units of transcription for a particular 
orthography would be those to which its frames 
corresponded. Thus, in English, the frames are 
consistent spellings of words, and the experienced 
reader^s intuition is surely that he reads word by 
word and not letter by letter, as he would if the 
transcription unit were the segment. This 
intuition is borne out by demonstrations of N^ord 
superiority." In these experiments, it is found, for 
example, that subjects can recognize a letter 
faster and more accurately when it is part of a 
real written word than when it appears alone or in 
a nonword (Reicher, 1969). This result suggests 
that in the case of a real word, subjects can use 
the orthographic information to recognize the 
word very rapidly, and then report the letters it 
contains. If the legment were the transcription 
unit, the letters corresponding to the segments 
should be recognized and reported faster than the 
words. 

However, it is possible that the unit of 
transcription does not really depend on the frame 
used in a particular orthography, but is in fact 
always the word. One reason for believing this is 
that the word has to be the most efficient unit of 
transcription, because words are the largest 
lexical structures. Anything smaller would require 
processing more units per utterance; anything 
larger could not be readily coded orthographically. 

Chinese writing allows a test of this possibility. 
A Chinese word consists of one or more 
monosyllabic morphemes. In the writing, 
characters are the frames and correspond to these 
morphemes. Words as such are not demarcated. 
There is some evidence, however, that the unit of 
transcription is nonetheless the word. In a recent 
experiment (Mattingly & Xu, in preparation), 
Chinese speakers were shown sequences of two 
characters on a CRT. In half the sequences, one of 
the characters was actually a pseudocharacter, 
consisting of two graphic components that ir 
actual writing occur separately as components of 
other characters, but not together in the same 
character. Of the sequences in which both 
characters were real, half were real bimorphemic 
words and half were pseudowords. The subject's 
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task was to respond Tes * if both charactara in a 
sequence were genuine and "No * if either was a 
pseudocharacter. Subjects performed this task 
faster for words than for pseudowords, and it was 
possible to show that this was not simply an effect 
of the hii^er transitional probabilities of the word 
sequences, but rather a valid "Srord superiority* 
effect. This result, like that of an earlier 
experiment by C. M. Cheng (1981, summarised in 
Hoosain, 1991) suggests that despite morphemic 
framing and the absence of word boundaries, the 
word is the transcription ucit for Chinese readers. 
Other writing systems in which words are not 
framed remain to be investigated. 

But if word-size frames are not essential for 
reading word by word, why is a narrow phonetic 
transcription an unlikely orthography? The reason 
must be that the shapes of words in sudi a tran- 
scription are context-sensitive and thus difficult to 
recognize. (Notice what happens to nmmSl, hand, 
in DiSBlowlz], hand tools, [hsqflrmjd], hand 
grenade^ Vbamftktlhand picked, etc.). The reader 
is therefore forced to process the transcription 
symbol by symbol, a slow and arduous procedure. 
In Chinese, on tiie other hand, though word- 
boundaries are absent, the form of an orthographic 
word is constant, or at least not subject to 
contextual variation* It is suggested that tiiis is a 
Tninimftl constraint that all writing systems must 
meet, so that words can serve as units of tran- 
scription. 

Although words are the transcription units, 
writing always employs graphemic units 
corresponding to linguistic units smaller than the 
word. It might seem possible, in principle, to have 
a pure logographic system, consisting simply of 
one monolithic symbol for each word. But the 
difficulty with such a system is that while the 
lexicon of a language is, in principle, finite, it is in 
practice, indefinite: New words are continually 
being coined or borrowed. In some cases — a nonce 
word or an unusual foreign name, for example — ^it 
would make little sense to provide a special 
logogram. A writer could thus find himself with no 
means of writing a particular word because no 
logogram for it existed. Or, of course, he could be 
stuck simply because he did not know the correct 
logogram. An actual writing system insures that 
the writer will never be in this situation by 
providing a system of spelling units. The 
availability of the spelling ^stem guarantees that 
the orthogre^hy will be "productive,* that is, that 
the writer who has mastered the spelling rules 
will always have some way (though it may not be 



the "corrodT or standard way) to write every word 
in the language (Mattingly, 1985). 

Hie only linguistic units that have served as the 
basis for spelling units are syllables and 
phonemes. It might be thought that morphemes 
could be the basis of a spelling system and some 
(e.g., Sanq;>son, 1985) have argued that Chinese 
has such a system, because the characters 
correspond to morphemes. Hus is true, but, as has 
already been noted, these morphemic units are 
frames: Relatively few of the characters in the 
inventory are simple logograms. Over 90% are 
phonetic compounds, each consisting of two 
gr^hic components that (in general) occur also as 
separate logogr^hic diaracters« One of these, the 
phonetic* stands, in principle, for a particular 
phonological syllable, and the set of phonetics 
thus constitutes a syllabary. The other, the 
"semantic,* is one of 214 determiners that serve to 
mitigate the extensive homophony of Chinese: The 
number of monosyllabic morphemes far exceeds 
the number of phonologically distinct syllables. 
The situation is complicated, however, because 
there is usually more than one phonetic 
corresponding to a particular phonological syllable 
(there are about 4000 in all for about 1300 
phonologically distinct syllables), and because, 
through various accidents of linguistic history, a 
phonetic often has different phonological values in 
different characters. But these circumstances 
should not obscure the highly systematic, 
syllabographic nature of the spelling, any more 
than the existence of several spelling patterns for 
one sound, and numerous inconsistencies in letter- 
to-sound correspondence, should obscure the 
systematic, alphabetic nature of English spelling 
(DeFrands, 1989). 

Words can indeed be analyzed into morphemes 
as well as segments and syllables, but the 
inventory of morphemes in a language, like the 
inventory of words itself, is indefinitely large and 
subject to continual diange. While logograms that 
are morphemic signs can have a valuable 
supplementary function in orthography, they 
could not constitute a productive spelling system, 
and there is no orthography in which they play 
this role. 

Syllables and segments, on the other hand, have 
several properties that make them suitable as a 
basis for spelling uniU. First, a word can always 
be analyzed as a sequence of phonological 
elements of either ^rpe. Second, the inventory of 
syllables may be small (and indeed was small in 
all the languages for which syllabic repelling 
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developed independently) and the inventory of 
segments is always small. Third, the membership 
of these inventories changes only very slowly. No 
other linguistic units have these convenient 
properties, save perhaps phonological distinctive 
features (Because a diacritic is used to indicate 
voicing, it could be maintained that features have 
a marginal role in Japanese spelling). 

In sum, every orthography needs to have a 
spelling system and a spelling system is 
necessarily phonographic. It is not accidental that 
all orthographies spell either syllabically or 
segmentally: there is probably no other way to 
spell. 

THE INVENTION OF WRITING^ 
Writing was invented, probably several times, 
by illiterates. From what has been said already, it 
follows that what had to be discovered was one or 
the other of the two possible spelling principles, 
the syllabic or the segmental, and that this must 
have required awareness of these units of the 
linguistic representation. How could the inventors 
have arrived at such awareness? 

Some linguistic units seem to be more obvious 
than others. Awareness of words can perhaps be 
assumed for most speakers, even if they are pre- 
literate or illiterate. It probably requires only a 
very modest degree of awareness to appreciate 
that an utterance is analyzable as a sequence of 
syntactically functional phonological strings, if 
only because sequences consisting of just one such 
string are quite frequent: Words may occur in 
isolation. Certainly preliterate children have no 
difficulty in understanding a task in which they 
are to complete a sentence with some word, and a 
linguist's naive informant readily supplies the 
names of objects. Awareness of syllables as count- 
able units may also be fairly ¥nde8pread. The syl- 
lable is the basis for verse in many cultures; pre- 
literate diildren can count the number of syllables 
in a word. This kind of syllabic awareness, how- 
ever, is probably not the same thing as being 
aware (if such is indeed the case) that the sylla- 
bles of one's language constitute a small inventory 
of readily demarcatable units. 

These limited degrees of linguistic awareness 
are probably readily available to speakers of all 
languages. But more subtle forms of awareness 
may well have arisen only because they were 
facilitated by specific properties of certain 
languages, including, in particular, those for 
which writing was originally invented. 

Consider, first, Chinese. In the Ancient Chinese 
language, words were in general monomorphemic. 



there being neither compounding nor afHxation. 
Morphemes were monosyllabic and a particular 
morpheme was invariant in phonological form. 
Because of restrictions on syllable structure, the 
inventoxy of syllables was small. Komophony was 
therefore very extensive, one pliable correspoad- 
ing to many morphemes (Chao, 1968). The num- 
ber of different diaracters in the Chinese writing 
system f^yring a particular phonetic component 
gives some notion of the degree of homophony in 
Ancient Chinese, and this number often exceeds 
twenty. Chinese thus contrasts sharply with 
English and other Indo-European languages, in 
which morphemes vary in phonological form, may 
be polysyllabic, and may not even consist of an in- 
tegral number of syllables; syllable structure is 
complex; the number of possible syllables is rela- 
tively large; and homophony is therefore a 
marginal phenomenon. 

Since words coincided with morphemes in 
Chinese, awareness of morphemes required no 
analysis, and the use of logograms, i.e., 
morphemic signs, was an obvious move. The 
extensive homophony made ^phonetic 
borrowing" — using the sign for one morpheme to 
write another morpheme with the same syllabic 
form^ — a strategy that was both obvious and 
productive; when a writer needed to write a 
morpheme, a sign with the required sound was 
very likely to be available. It thus became obvious 
that the number of different sounds was in fact 
small, yet every morpheme corresponded to one of 
them. Awareness of demarcatable syllable units 
thus developed. Of course, the same extensive 
homophony that fostered the discovery of these 
units also meant that their signs had to be 
disambiguated by the use of logograms as 
determiners, as in the lai^e class of characters 
called phonetic compounds,* described earlier. 

Chinese morphophonological structure thus 
encouraged the discovery of the syllable; on the 
other hand, it did not encourage the discovery of 
the phonemic segment. There was nothing about 
this structure that would have served to isolate 
phonemes from syllables or morphemes. 

Sumerian was an agglutinative language. A 
word consisted of one or two monosyllabic CVC 
morphemes and various inflectional and deriva- 
tional affixes. Its phonology had certain properties 
that imply a preference for a CVCVC...VC syllabi- 
fication. There were no intrasyllabic consonant 
clusters; a cluster simplification process deleted 
the first of two successive consonants across syl- 
lable boundaries, resulting in such alternations as 
tUtIi, life'; and final vowels were deleted (Driver, 
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1976; Kramer, 1963). In other relevant regpecU, 
however, Sumerian reiembled Chinese and, like 
Chinese, favored awareness of morphemes and of 
syllables as demareatable units. Aside from the 
effects of the syllaUe-forming processes just men- 
tioned, a root maintained an invariant phonologi- 
cal form. A root could be repeated to indicate plu- 
rality. Because the morphemes were monosyllabic, 
and because of the restricted syllable structure, 
the number of possible distinct syllables was 
small. These circumstances, resulted, again, in ex- 
tensive homophony. 

For a speaker of Sumerian to become aware of 
morphemes was perhaps not quite as easy as for a 
speaker of Chinese. He would have had to notice 
that words with similar meanings often had 
common components, for the most part 
corresponding to syllables. This stage of 
awareness having been achieved, morphemic 
writing is possible. From this point on, the story is 
quite similar to that for Chinese, homophony 
leading to phonetic borrowing^ and then to syllable 
¥^ting supplemented with determiners. 

There is, however, one striking difference 
between the bomerian and the Chinese writing 
systems. While Chinese makes no internal 
analysis of syllables, Sumerian does. A sign for a 
C1V1C2 morpheme could be borrowed to write a 
C1V1C3 morpheme, e.g., the RIM sign was used to 
write rln. A VC syllable sign could be used as a 
partial phonetic indicator after a logogram, e.g., 
GUL + UL. For many of the C1V1C2 syllables, as 
has been mentioned, there was no special sign; 
instead, such a syllable was written with the sign 
for the CiVi followed by the sign for V1C2. Thus 
the syllabb tal is written RA AL (examples from 
Gelb, 1963). A possible explanation of these 
various practices is that in spoken Sumerian, 
consistent with its preference for CVCVC...VC 
structure, some form of vowel coalescence took 
place T^faen two similar vowels came together, so 
that CiVi + V1C2 sequences became phonetically 
CiViC2> and thus homophonous with original 
C1V1C2 syllables. Such homophony could have 
suggested analyzing and so writing the latter as 
CiVi + V1C2. Again CV signs as well as VC signs 
were used to indicate the endings of C1V1C2 
morphemes. For example, becriuse of multiple 
semantic borrowing, the logogram DU could stand 
not only for du, leg,' but also for gin, 'go,' gab, 
'stand/ and tarn, %ring^. Which of the latter three 
was intended was indicated by writing DU NA for 
gb, DU BA for gub, and DU MA for turn (Driver, 
1976). This practice perhaps arose because the 
phonological final vowel deletion made C1V1C2 



and C1V1C2V2 sequences homophonous, 
suggesting that what followed CiVi could be 
written in either ease as if it were C1V2. Thus the 
Sumerians may have viewed C1V1C2 morphemes 
either aa CiVi + V1C2 or as QiVi + C2V2, either of 
which was entirely consistent with their syllabic 
phonological awareness. 

With Egyptian, in contrast to Chinese and 
Sumerian, the morphology and phonology of the 
language of the language favored segmental 
awareness. In Afro-Asiatic languages, the roots 
are biconsonwital and triconsonantal patterns 
into whidi different vowels or zero (that is no 
vowel at all) are inserted to generate a large 
number of inflected forms. Because the vowels of 
Egyptian are unknown, it is easier to illustrate 
this point with an example from another Afro- 
Asiatic language, e.g., Hebrew. From the Hebrew 
root k-t-b are derived kitab, %e wrote'; yOdcUAb, "he 
will be inscribed'; kitob*to write'; kitnb, ^written'; 
miktab, better; and many other forms. Because of 
phonological restrictions, the nimiber of different 
consonantal patterns in Egyptian was relatively 
small, and there were consequently numerous 
homophonous roots, e.g., n*f*r, *good'; n-f*r, lute' 
(Jensen, 1970). 

It is not difficult to imagine an Egyptian 
noticing that many sets of semantically similar 
words in his language had a common consonantal 
ground and a varying vocalic figure, though at 
first he may not have individuated the 
consonants. Accordingly, signs for root morphemes 
were devised. The homophony of Egyptian then 
did for phonetic segments what homophony in 
Chinese and Sumerian did for syllables. A 
morphemic sign was frequently borrowed to write 
a homophonous morpheme, e,g., NFR, the sign for 
n-f-r, lute', used to write n-f-r, 'good,' or WR, 
'swallow,' used to write w-r, *big.' The signs were 
now generalized to stand for consonantal 
sequences that were not morphemes, e.g., WR < 
WR was used to write the first part of w-r-Ki, 
Veary.' And because in some cases roots were 
actually uniconsonantal, and in other cases the 
second consonant had become silent, some signs 
came to stand for single consonants, and 
constituted a consonantal alphabet Thus the d in 
w-THl could written with the sign D < DT, the final 
consonant in d-t, liand,' being actually the 
feminine suffix, not part of the root. Finally, 
logograms were employed as determiners to 
clarify ambiguous transcriptions: the spelling MN 
N H for the word m-n-h being followed by the 
determiner for plants' when this word had the 
sense papyrus plant,' the determiner for ^en' 
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when it had the sense Vouth/ and the determiner 
for "minerals' when it had the sense 'wax' 
(examples from Jensen, 1970). In this fashion, the 
Egyptians arrived at a consonantal spelling 
system* 

If the Egyptians had thus achieved segmental 
awareness, why did they not transcribe the vowels 
as well as the consonants? It is not likely that they 
were unkble to hear the different vowels. The 
explanation is rather that because the vowels 
ordinarily conveyed only inflectional information, 
the writing was sufficiently unambiguous without 
such indications, just as English writing is 
sufficiently unambiguous without stress marking. 
But as has already been noted, there was a 
convention for writing vowels when necessary. 
Such writing is found very early in the history of 
Egyptian writing (Gelb, 1963). 

The Egyptians could hardly have arrived at a 
syllabic system instead. Becauce zero alternated 
with vowels in the generation of words, there was 
no obvious correspondence between morphemes 
and syllables or pliable sequences. And because 
of such alternations, a syllabic orthography would 
have resulted in a number of dissimilar spellings 
for the same morpheme. 

These examples suggest that the phonological 
awareness required for the invention of writing 
develops when morphemes have a highly re- 
stricted phonological structure— monosyllabic, in 
the case of Sumerian and Chinese; consonantal in 
the case of Egyptian— that results in pervasive 
homophony. Speakers of such languages are natu- 
rally guided to the invention of writing by these 
special conditions. (A corollary is that it is not 
necessary to propose a derivation of Egyptian 
from Sxmierian to account for parallels in the de- 
velopment of the two systems.) On the other hand, 
Indo-European languages and many others lack 
any such restrictions, and would not have favored 
phonological awareness in this way. Indeed, one 
has to wonder whether, for such languages, Mat- 
ing could have been invented at all. 

In the early discussion of the psychology of 
reading, the precise role of phonological 
awareness in learning to read appeared equivocal. 
Is phonological awareness a prerequisite for 
reading? Or, on the other hand, does the 
experience of reading engender phonological 
awareness (Liberman, Shankweiler, Idberman, 
Fowler, & Fischer, 1977)? It was later seen, 
however, that both statements must be true: The 
beginning reader must, indeed, have some degree 
of awareness, but this awareness is increased and 
diversified in appropriate directions as a result of 



his encounter with the orthography (Morais, 
Alegria & Content, 1987). In the same way, the 
invention of writing must have been an 
incremental process, beginning with an initial 
awareness of morphemic structure. The 
experience of working out ways to transcribe 
morphemes for which there were no logograms led 
to awaireness of the syllabic or phonemic structure 
of these morphemes, and then to awareness of 
sudi structure generally. 

To say that the process was incremental is not 
to say that it was not quite rapid. It is noteworthy 
that in all three of the writing traditions just 
considered, evidence of spelling is found very 
early: in Sumerian writing from the Uruk IV 
stratum (Gelb, 1963); in Chinese vnriting of the 
Shang dynasty (DeFrands, 1989); in Egyptian 
writing of the First Dynasty (Gelb, 1963). These 
facts are consistent with tibe proposal that for 
general-purpose writing, a purely logographic 
system is impractical. As has been argued, an 
orthography is not productive without a spelling 
system: The invention of the one requires the 
inventi(m of the other. 

To the extent that this account of the invention 
of writing is plausible, it supports the dichotomy 
between syllsdbic and segmental spelling proposed 
earlier, for what had to be invented was one or the 
other of the two spelling principles that provide 
the basis for the classification. It should also be 
noted that the segmental principle did not develop 
in Egypt by elaborating on the syllabic principle, 
but rather by generalizing from the segmental 
transcription of morphemes: The syllable played 
no role. And, conversely, when Sumerians 
analyzed complex syllables, they did not resolve 
them into their constituent phonemes, but rather 
into simpler syllables. The discovery of one 
method almost seems to have guaranteed that the 
other would not be discovered. In effect, speakers 
of these languages come to regard them as as 
essentially syllabic or as essentially segmental, 
and their writing systems reflect one of these two 
phonological theories. 

TRANSMISSION OF WRITING 
SYSTEMS 

It has already been noted that orthographic 
traditions are either consistently syllabic or 
consistently segmental. Some explanation for this 
consistency is required. It seems natural enough, 
perhaps, that a segmental tradition should not 
become syllabic, for this would appear to be a 
backward step. But that no syllabic tradition 
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should have become segmental is puzzling, the 
more so because there have been at least two 
occasions when such a development might 
reasonably have been expected. The first was 
when speakers of Akkadian, an Afro-Asiatic 
language with consonantal root structure similar 
to that of Egyptian and Hebrew, borrowed 
Sumerian syllabic writing. A proper r^wareness of 
the morphophonology of their language would 
have suggested that they convert the Sumerian 
system into a consonantal system. But instead, 
the Akkadians preserved the syllabic rJiaracter of 
the borrowed writing, even thou^ to write the 
same triconsonantal pattern in different ways 
depending on the particular inflectional vowels 
obscured the roots of native words. Similarly, the 
Mycenaean Greeks borrowed Minoan syllable 
writing, and instead of making an alphabet out of 
it, as would have been sensible, given the 
extensive consonant clustering in Greek, they 
continued to write with signs that stood for CV 
syllables, either ignoring the ''extra* consonants or 
pretending that they were syllables. This resulted 
in such bizarre transcriptions sxich as A RE KU TU 
RU WO for alektrnOq 'cock* (Ventris & Chadwick, 
1973). What can have happened to linguistic 
awareness in these cases? 

The explanation begins with the observation 
that the mismatches between language and writ* 
ing observed for Akkadian and Mycenean Greek 
are not unparalleled; they are simply fairly ex- 
treme cases. While an originally invented writing 
system clearly reflects the morphophonological 
structure of the language it was invented to write, 
this situation is obviously exceptional. In general, 
the system used at a particular time to write a 
particular language has been inherited from an 
earlier stage in the history of that language, or 
has been adapted from a system (itself perhaps an 
adaptation) used for some other language, or, 
most commonly, both. The consequence, in many 
cases, is that the ¥^ting often seems veiy poorly 
suited to the spoken language. If Akkadian and 
Mycenaean Greek illustrate the risks of borrow* 
ing, the English writing system is a good illustra* 
tion of the effects of orthographic inheritance. The 
phonology of English has changed considerably 
since the fifteenth century, most notably in conse- 
quence of the Great Vowel Shift, but the writing 
system has remained very much as it was then 
(Pyles, 1971). As a consequence, the system has a 
number of features that must seem veiy peculiar 
to the foreigner learning English: For example, 
the same letter is used to write phonetically dis- 
similar vowels, a tense vowel is denoted by an E 



after the following consonant, and a lax vowel is 
der.oted by the doubling of this consonant. A simi- 
lar account could be given for Chinese writing, 
which corresponds more closely to Classical 
Chinese than to any modem dialect 

It cannot be doubted, given what has been 
learned in recent years about the relation between 
orthographic structure and learning to read in 
modem languages, that such compUcations place 
a heavy burden on the learner (Liberman, 
liberman, Mattingly, and Shankweiler (1980). 
What is surprising, given the close connection 
between literacy and awareness of linguistic 
r^resentations, a connection clearly essential in 
the invention of writing, is that readers and 
writers have so often happily accepted (once they 
have learned it) an orthography that seems poorly 
matched to their language. It might have been 
expected that Akkadian cuneiform would have 
been rejected as soon as it was proposed, and that 
English orthography would by now have been 
abandoned as obsolete. But, instead, it is reported 
that the Akkadians believed their writing system 
to be of divine origin (Driver, 1976), and (Chomsky 
and Halle (1968) say that ''conventional [English] 
orthography is*. .a near optimal system for the 
lexical representation of English words" (p. 49). 

In the case of inherited orthographies, the ex- 
planation may be that the orthography itself may 
determine not only which aspects of linguistic rep- 
resentations are singled out for awareness, but 
peihaps, indirectly, the character of these repre- 
sentations themselves. This could come about if 
the orthographically based, synthetic input repre- 
sentations were taken seriously by the language 
processor as evidence about the structure of the 
language, and thus led to a^ustments in the be- 
ginning reader's morphophonology* It will be re- 
called that according to the sketch of llie reading 
and writing process given earlier, the processor 
does not distinguish synthetic representations 
from natural ones. Consistent with tiiis possibility 
is the fact that orthographic conventions some- 
times mimic phonology: The conventions for 
marking EngUsh tense and lax vowels invite the 
reader to assume that underlying lax vowels 
become tense in open syllables and underlying 
tense vowels become lax before underlying 
geminate consonants. Such pseudophonological 
rules, as well as derivational morphological 
relations as those between heal, health or 
telegmph, telegraphy, though at first having 
merely orthographic status, may acquire linguistic 
reality for the experienced reader.^ For such a 
reader, the orthography corresponds to linguistic 
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representations because the representations 
themsehree have been appropriately modified, and 
English orthography now indeed seems "^ear 
optimal.* 

In the case of borrowed orthographies, a similar 
explanation may apply. The phonological 
awareness of a borrowing group, such as the 
Akkadians or the Greeks, was not guided by 
peculiarities of their own spoken language, as was 
the awareness of the original inventors of writing, 
but by the writing system they were borrowing. 
This is hardly surprising: The borrowers were not 
sophisticated consumers, comparing competing 
technologies to decide which was better for their 
particular needs. They did not realize that there 
was a choice that coiild be made between the two 
different spelling principles and the theories of 
phonology implicit in eadi. They simply embraced 
unquestioningly the spelling principle — syllabic in 
the cases considered above--used by the culture 
under whose influence they had come, just as 
beginning readers accept the principle of the 
writing system they inherit This principle having 
been accepted, the morphophonologies of the 
borrowers adjusted so that their linguistic 
representations became, in fact, a good match to 
their syllabic orthographies. 

If this account is correct, it has to apply to the 
transmission of segmental systems, as well. A 
segmental system has obvious advantages over a 
syllabary for languages with complex syllable 
structure. But the spread of the alphabet is 
perhaps to be explained by an appeal to the forces 
of tradition rather than to those of reason. 

An orthographic tradition can perpetuate itself 
because it offers a particular brand of 
morphophonological awareness ready-made. The 
processes of introspection needed to invent writing 
in the first place are not demanded. The kind of 
awareness offered may be poorly matched to a 
particular language, but this does not impede the 
process. Whether the writing system is borrowed 
or inherited, the morphophonology of the new 
reader adjusts to meet the presuppositions of the 
system. 

CONCLUSIONS 
It has for some time been widely agreed that the 
notion of linguistic awarejpess is essential for an 
understanding of the reading process, the acquisi- 
tion of reading and reading disability. Ihis notion 
is likewise essential for an imderstanding of the 
invention and dissemination of orthographies. 
There are really only two possible ways to write, 
the syllabic method and the segmental method, 
because only by using ove of these two methods is 



the writer assured of being able to write any word 
in his language. But for an illiterate to discover ei- 
ther of these methods, and thus be in a position to 
invent writing, requires awareness of the appro- 
priate unit of linguistic representations. 
Awareness of syllables, or, on the other hand, of 
segments, is fostered by special morpho- 
phonological properties found in those languages 
for which writing systems were invented, though 
by no means in all languages. But once it has 
become established, the writing system itself 
shapes the linguistic awareness, ac 1 even the 
phonology, both of those who inherit the system 
and of those who borrow it to transcribe some 
other language. Thus, in the history of writing, 
syllabic and segmental traditions are clearly 
distinguished. 
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FOOTNOTES 

*In L. Kalz 4c R. Frost (Eds.), Orthography, phonology, morphology, 
and meaning (pp. 1-16). Amsterdam: Elsevier Science Publishers 
0992). 

^ Ako Uttivenity of Corvwcticut, Stotrs. 

^It will be asAimed here, foUowing Gelb (1963), Jensen (1970), 
DeFrancis (1989) and others, that there are sax major 
orthographic traditions: (1) Mesopotamian cuncifbna. beginning 
Witt) Sumerian (c. 3100 B.C.) and including Akkadian, cuneiform 
Hittile, Urartian, Hurrian, Elamite, Old PersUn; (2) Cretan, 
including Minoan Unaar A, Mycanaaan Creak Linear B, 



Cyprtotc, and Hittite hien>|^yphks, all probably derived from a 
common source (c 2000 B.C.); (3) Chinese, beginning with 
Chimae iladf (C1300 B.C) and inchidtng Korean nonalphabetic 
writing and Japmse; (4) Mayan (c 300 A.D.); (5) Egyptian 
(c. 3000 B.C); (6> Wast Semitic, beginning %vith Phoenician (c. 
1600 B.C) and indudii^ Raa Shamrah cuncifons^ Old Hebrew, 
South Arabic, Aramaic, and Greek alphabetic vmting. From 
Aramaic derive Hebrew, AiMc, and many others; £rom Greek 
derive Etruscan^ Latin, and many others. Germanic nmcs and 
Korean aliM>e(ic writing probably briong in this tradition also, 
Ihou^ the darivatim are not dear. All Imt the most dogmatic 
monogenetidals would agrae ttiat the Mesopotamian, Cretan, 
Chinese, and Minoan traditions are probably independent 
dev^opmcnte. But some sdwiars (eg., Driver, 1976; Ray, 1966) 
would derive Egyptian writing from Mesopotamian, and some 
(e.g.. Driver, 1976), wift aomei^t greater pUusibility, wouki 
derive Wast Semitic from Egyptian. 
^Egyptx^oglsts and most other students of writing l>elieve that 
Egyptian p hon ogiap hic signs stand for consonants, the vowels 
not being regulariy tranacribed. But according to Gelb, they 
stand izwtaad for gencralizad syOablas, e.g., the Egyptian sign 
tisually in ta ipi t tod as consona nta l w actually stands for wa, wi, 
we, wu, or wa, accotdii^ lo context It is obviously difficult to 
distinguish teae two accoimis empirically. The only support 
Gdb offers for his position is that 'ttw development fnnn a 
logographic to a consonantal writing, as generally accepted by 
the Egyptdogists^ is tmknown and unthinkable in the history of 
writing* (Gdb 1963, p. 7S). But this argument is deariydrcular 
(EdgCTton, 19S2; Mattki^y,196^. 
^Gdb (195i 1963) propoacd some cases in which syllabic systems 
are supposed to have devtk>ped into segmental S3rstems ; but see 
Edgerton (1953). E6ik)pic writing, derived from the West Semitic 
coMonantal tradition, might be viewed as a syllabic system 
derived from a segmental system, because the signs do 
co iia sp ond lo ayllsMas. But, witiii a few exceptions, each sign 
actually ccMisIs of a consonant letter plus a vowel mark, except 
that a is left unmarked. As in ^ case of Indie systems, one 
could argue about whether Ms is a consonantal or a plene 
system, but it is certainly not a syllabic system (Sampeon, 1985). 
^Ihe proposals in this section are dcv^oped in more detail in 
Mattingly a991). 
^ Japaneae kana correspond, strictly speaking, to moras, which are 
not equivalent to En^ish syllables. But they do belong to a 
general class of phonologkal tmits that can be called ''syllables" 
(see, e. g^ Hyman, 197S). 
^An earlier formulation of some of the proposals in this section 

can be found in Mattingly (1987). 
'^DeFrands (1950), protesting against the ''monosyllabic myth," 
has suggested tet there actually ivere many polysyllabic words 
in Ancient Chineae, just as in Modem Chinese, but that only one 
of the syllables in a word was transcribed in the writing. Thus, 
morphones that appear from the writing to be monosyllabic 
homophones may actually have been polysyllabic morp4>eme5 
with common hooK^honous syllalrfes. Y.-R. Chao's (1968) 
rfspoiwe was that "so frr as Claarical Chinese and its writing 
system is concerned, the monosyllabic myth is one of the truest 
myths in Chinese my^ogy" (p. 103). For the present puipose, 
however, it does not matter whether tfw myth is true or false. 
DeFrands's partial homophony vriU aerve as well as the total 
homophony more tmially attributed to Andent Chinese. 
•Or, on DeFrands' (1950) viaw, anodier morpheme having a 

syUsHe in common. 
^Ihaae dtangas in tiw moiphophonoiogles of individual readers 
have, by hypothesis, no hmka in tiie spoken langiuge and are 
Uan su ii t tad only from writer to reader, and not from mother to 
cMkL Thus, though psychok>gicaay real they are not part of the 
grammar of the language as usually oonceivad of . 
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The Effects of Aging and First Grade School on the 
Development of Phonological Awareness* 



Shlomo Banitin,t Ronen Hammer^tt and Sorel Cahantt 



The independent influence of aging and schooling on the development of phonological 
awareness was assessed using a between^gradas quasi^experimental design. Both 
schooling (first grade) and aging (5-7 years) si^iificantly improved children's performance 
on tests of phonemic segmentation, but the schooling effect was four times bigger than the 
aging effect 'Die schooling effect was attributed to formal reading instruction, whereas the 
aging effect probably reflects natural maturation and informal exposure to written 
language. These data support a strong mutual relation between reading acquisition and 
phonological awarMiess. 



Phonological awareness is the aptitude of being: 
aware of the phonemic structure of spoken words. 
It is usually assessed by testing the subjects' 
ability to isolate and manipulate individual 
phonemic segments in words. 

Although as soon as a child is able to under- 
stand and produce speech he obviously makes 
phonemic distinctions, the ability to manipulate 
phonemic segments consciously develops only 
around the first grade in the elementary school. 
For example, Liberman, Shankweiler, Fisher, and 
Carter (1974) found that none of the pre- 
kindergartners and only 17% of the kinder- 
gartners tested were able to parse words into 
phonemes, while 70% of the first graders tested 
succeeded in doing so. 

The sigr ^cant mprovement in phonological 
awareness at this age may be primarily ascribed 
to one of two factors (which are not mutually 
exclusive): (1) cognitive-linguistic skills which ma- 
ture at about the age of six independent of formal 
reading instruction (Bradley & Bryant, 1983); 
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or (2) learning to read in an alphabetic orthogra- 
phy (Bertelson, Morais, Alegria, & Content, 1985). 
In contrast to speech, where individual phonemes 
are coarticulated and overlap in the acoustic 
stream, in writing the phonemes are represented 
by dearly defined orthographic segments, the let- 
ters (see Liberman & Mattingly, 1989). Assiiming 
that children learn about these letter-sound corre- 
spondence when they learn to read, it seems likely 
that during the acquisition of reading skills they 
become explicitly aware that words are formed of 
the sounds which the letters represent. Owing to 
the impossibility to experiment with elementary 
school attendance, the effect of reading instruction 
on phonological awareness has been investigated 
only indirectly in studies that have relied on natu- 
ral variation: (1) between literate and illiterate 
adults; (2) between different orthographic systems 
(alphabetic vs. logographic) among literates; or (3) 
in the emphasis upon letter-sound correspondence 
between reading instruction methods within the 
alphabetic system (e.g., ''analytic*' vs. ''globaF 
methods). 

Most of these studies suggested that learning to 
read triggers, or at least promotes the develop- 
ment of phonological awareness. For example, 
Morais, Gary, Alegria, and Bertelson (1979) 
reported that the performance of illiterate adults 
on tests of phonemic segmentation was inferior to 
that of other adults from the same rural 
community who learned to read in adulthood (see 
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alio Monis, Caitro, Scliar-Cabral, Kolinsky, & 
Content, 1987; Moraii , Bertelson, Gary, & Alegria, 
1986). In Chinese adults, Read, Zhang, Nie, and 
Di^^ (1986) found higher phonological awareness 
in subjects who learned to read the alphabetic 
(pinyin) orthogn^hic system than in sulpects who 
read only the logographic system (kanji). 
Equivalent results were found with children in 
first class; those who learned to read according to 
the 'analytic* (segmental) method performed 
better on tests of phonemic segmentation than 
those who learned to read by the ""globaT (holistic) 
method (Alegria, Pignot, & Morais, 1982). 

However, while the studies cited above suggest 
that literacy influences the development of 
phonological awareness they do not prove this 
claim. The caveat is that they all share the serious 
problem of possible confounding of differences in 
the extent or method of reading acquisition with 
other variables that may have influenced 
phonological awareness (e.g., the amounts of 
informal linguistic experience). Thf^refore, there is 
still a need to specify the effect of sdiooling in 
general and reading acquisition in particular, on 
to the sharp improvement in phonemic segmen* 
tation ability which occurs in the first year of 
schooling. Such a specification is important 
particularly because claims about the causal link 
between phonological awareness and literacy have 
been largely based on positive correlations found 
between the performance of children in tests of 
phonemic segmentation and their reading skills in 
English (e.g., Bradley & Bryant, 1985; Ltberman, 
1973; Fox & Routh, 1975; Treiman & Baron, 1981) 
as well as in other languages such as Italian 
(Cossu, Shankweiler, Liberman, Katz, & Tola, 
1988), Swedish (Lundberg, Olofsson, & Wall, 
1980), Spanish (de Manrique & Gramigna, 1984), 
and Frendi (Bertelson, 1987). 

The present study circumvents the confounding 
problem by utilizing a recently introduced quasi- 
experimental paradigm, that allows for the post 
hoc disentangling of the independent effects of age 
and schooling (Cahan & Davis, 1987). This 
approadi entails administration of the same test 
to at least two a^jcLcent grade levels and takes 
advantage of the sdiool cutoff that is imposed in 
most countries. The overall cross-sectional 
increase in mean test scores as a function of age is 
decomposed into within-grade and between-grades 
segments which can be attributed to age and 
schooling effects, respectively. 

Theoretically, this could be adiieved by compar- 
ing children bom one day before the cutoff da^ 
with children boru one day after (Morrison, 1988); 



those children will differ by only one day in age, 
but by a full year of sdiooling. Similarly, children 
that are bom in the first and the last day of one 
schooling year will differ in age by a full year 
while being in the same grade. Unfortunately, 
aside of the logistic difficulty to find enough diil- 
dren in each birth date group, this i^proach suf- 
fers from a serious shortcoming of selection, be- 
cause the cutoff date is never stricUy imposed. 
Moreover, those exceptions are not random: 
Intellectually advanced diildren who are slightly 
younger than the official school age are often ad- 
mitted, while diildren who are somewhat older 
than the cutoff point but insuffidently developed 
may be held back an additional year (Cahan & 
Davis, 1987, Cahan & Cohen, 1989). This creates 
a situation of 'Wssing^ children in each grade, 
particularly among diildren at the extreme age 
points. Sudi selective misplacement usually leads 
to overestimatBon of the sdiooling effect (Cahan & 
Cohen, 1989). 

A possible solution of the selection problem is to 
base the estimation of age and schooling effects on 
the predicted (rather than empirically obtained) 
mean test scores of the youngest and the oldest 
children in each grade. Prediction would be based 
on the best fitting regression of test scores on 
chronological age across the entire l^al age range 
in that grade, with the exclusion of the selection- 
tainted birth dates near the cutoff point. This idea 
underlies the recently proposed between-grades 
regression discontinuity design (Cahan & Davis, 
1987). In the present study we applied the same 
model to the estimation of the independent effects 
of one year of schooling (during whidi reading 
acquisition was the primary curricular activity) 
and one year of aging on the development of 
phonological awareness as evidenced by tests of 
phonemic segmentation. 

Method 

Design. The •between-grades" quasi-experimen- 
tal paradigm (Cahan & Davis, 1987) relies on two 
assumptions: (1) the 'allocation* of children to 
birth dates is random, and (2) the grade level is 
solely a function of dironological age, that is ad- 
mission to school is based only on chronological 
age, according to some arbitrary cut-off point, and 
that progression throu|^ grades is automatic. 

If these assumptions were valid, the age and 
schooling effects are estimated by means of a 
regression discontinuity design (Cook & CampbdQ, 
1979), involving regressions of test scores on 
dironological age. The effect of age is reflected by 
the sl<qpe of the within-grade regressions, whereas 
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the effect of schooling is reflected in the 
discontinuity between the two regression lines. 

Hie first assumption of the model is reasonably 
met. The second is more problematic because, in 
practice, the admission to school is not solely a 
matter of the child birth date. As mentioned in the 
introduction, relatively bri^t children might en- 
ter the first grade ^early,* whereas children who 
are not sufficiently developed (intellectually or 
emotionally) remain an additional year in kinder- 
garten. The frequency of grade misplacement is 
particularly high near the official cut off point 
(which in Israel is based on the Hebrew calendar 
and falls sometime in December; see Cahan & 
Cohen, 1989 for details). In order to cope with this 
problem of selection, we excluded from the compu- 
tation of the within-grade regressions two groups 
of children: (1) children who did not fall into the 
official age range of their cohort and (2) first 
graders bom in November or December 1982 (i.e., 
the oldest in their class), the months with the 
highest proportion of hissing* children (Cahan & 
Cohen, 1989). 

Subjects. The sample consisted of all first 
graders bom in 1981 (with the exceptions de- 
scribed above) frequenting the seven elementary 
schools serving four neighborhoods of Jerusalem 
(319 children of both genders), and all children 
bom in 1982 from the 19 kindergartens serving 
the same neighborhoods (352 children of both 
genders). The selected neighborhoods represented 
upper middle-class, middle-class, and lower- 
middle class population. 

Tests and Materials. Phonological awareness 
was measured by a battery of four sub-tests of 
constrained phonemic segmentation (Goldstein, 
1976; Zhurova, 1973) each containing 20 items. 
The sub-teats were selected from a battery devised 
and validated in a pilot study (H. Leshem, 
unpublished doctoral dissertation), and were 
chosen because they did not require subjects to 
perform cognitive operations other than phonemic 
segmentation (for a survey of various types of 
segmentation tests see Cont^t, Kolinsky, Morais, 
& Bertelson, 1986; Stanovitch, Cunningham, & 
Cramer, 1984). The tasks were: 

1. Isolation of the first phoneme in spoken 
words. The children were instmcted to utter the 
first phoneme in words pronounced by the 
examiner. 

2. Isolation of the first phoneme in self 
generated pictures* names. The children were 
shown pictures of common objects and asked to 
pronounce the first phoneme in the name of each 
ol^ect 



3. Isolation of the last phoneme in spoken 
words. Similar to test 1 except that the last 
phoneme had to be isolated. The words were 
different than in test 2. 

4. Isolation of the last phoneme in self 
generated pictures' names. Similar to test 2 except 
that the last phoneme in the name of each object 
had to be isolated. The objects were different than 
in test 2. 

The words and object names were selected in 
collaboration with teachers in the respective 
grades to be part of the children's vocabulary. 
They were uni- to three-syllabic words. Both 
consonants and vowels were used as initial or last 
phonemes. 

Measures of phonological awareness. The 
phonological awareness score of each child was the 
percentage of correct responses across all four sub- 
tests. In addition, two error scores were calculated 
per subject: (1) Tbe percentage of syllabic (rather 
than phonemic) segmentation. (2) llie percentage 
of sub-syllabic (i.e., consonant -f vowel) 
segmentation. This distinction was particularly 
desirable in this study because in Hebrew vowels 
are represented primarily by diacritical marks 
that are always appended to consonantal letters. 
Hence, the basic phonemic unit that is mostly 
emphasized by teachers during the processes of 
reading acquisition is bigger than a single 
phoneme, including a consonant and a vowel. In 
many cases, however, this CV unit does not form a 
syllable. Thus, it is possible that, unlike in Italian 
or English, in Hebrew learning to read should 
develop some awareness to sub-syllabic rather 
than phonemic segments. 

Procedure. The entire sample was tested within 
the last two weeks of February. Hence, the school 
children had 5 months of reading instruction. The 
examiners were 20 students of education or 
psychology who received special training; they 
were sent at random to first grade classes and 
kindergartens and most tested both groups of 
children. 

The tests, which lasted together from 30 to 40 
minutes, were administrated individually in a 
separate room in the school (or kindergarten). 
Before performing each task, the child was given a 
fixed number of practice items, preceded by an 
example. During practice, but not during the test, 
feedback was provided and errors were corrected. 

Results 

As expected, the percentage of correct responses 
on the phonemic segmentation battery was hii^er 
in school children (76%, SDsl4%), than in the 
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kindergarten (35%, SD>=23%) (M674)s29.12, 
p<.0001). This difFerence refleeU the combined 
effects cf age and schooling. The separate effects 
of these two factors are revealed in the analysis of 
the within-grade linear regressions of phonological 
awareness scores on age (Figure 1). 

Owing to the insignificant difference in the 
slopes obtained within each grade level, it was 
assumed that the two regression lines were 
parallel. Accordingly, the net effects of 
chronological age and schooling were obtained 
from the regression coefficients of age (in months) 
and grade level in the multiple regression 
equation of test scores on age and grade. The net 
effect of one year difference in chronological age 
was 9% (SEs3.0%), and the net effect of one year 
of schooling was 32% (SEs3.4%) (see Figure 1). 
Both effects and the difference between them were 
significant (p<.05). 

As would be expected, improved phonemic seg- 
mentation, whether as a function of chronological 
age or of schooling, was accompanied by a reduc* 
tion in the percentage of errors. Separate analyses 
of the effects of schooling and age on syllabic and 
Bubsyllabic segmentation revealed that schooling 
had a larger effect than aging in reducing both 
types of errors. However, while schooling reduced 



syllabic segmentation more than CV segmenta- 
tion, the effect of maturation was bigger on CV 
than on iylUdric segmentation (Table 1). 

Tabk 1. Percentage (SD) of syllabic and sub-syllabic 
segmentation errors made by kindergarten and first 
grade children. 





KlnderDurteii 


Grade A 


Syllabic enocs 


12(5) 


8(6) 


Sttb-cyllabic cnon 


27(13) 


13(7) 



Discussion 

The results of the present study point to 
schooling as a ms^or factor affecting the 
development of phonological awareness. While 
they prove that an age difference of one year 
significantly improves performance on some 
segmentation tests, the present results revealed 
that the experience accumulated during the first 
five months of schooling enhanced phonological 
awareness four times as much. This effect was 
impressive in both absolute and relative term&: 
32% correct answers corresponds to an effect size 
of 1.4 kindergarten standard deviations, which is 
an unusually large effect 
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Interpreting the schooling effeo^. w3 should 
consider that we tested our sample d^jring the last 
two weeks of February. Hence, this effect is based 
on only the first five months in school. Although 
during the first grade Israeli school children are 
involved in a variety of scholastic topics, the main 
curricular activity during the first half of the year 
is dedicated almost entirely to reading instruction. 
At the same time, the kindergarten activity 
includes no formal exposure to the alphabet. 
Consequently, we suggest that the schooling effect 
reflects primarily reading instruction and, 
therefore, that the present results support the 
contention that learning to read significantly 
enhances phonological awareness. 

Additional support for a connection between 
reading instruction and the development of 
phonological awareness is provided by the 
analysis of errors. Indeed, the method of reading 
instruction adopted by a great majority of Israeli 
schools (^thout secrets*) emphasizes the so\md 
of individual orthographic segments. However, as 
already mentioned, many orthographic segments 
in Hebrew are, in fact, mapped into two 
phonemes, a consonant and a vowel. Accordingly, 
althoui^ sdiooling reduced errors caused by sub- 
syllabic (CV) as well as syllabic segmentation, the 
former were reduced less. This trend contrasts the 
usual findings in other languages where a direct 
transition fi'om syllabic to phonemic segmentation 
was observed (e.g. Cossu et al., 1988), and is best 
explained by the specificity of the Hebrew 
ordiography. Thus, the schooling effect on the 
pattern of errors suggests that reading instruction 
foster phonological awareness by manipulating 
language-specific orthographic segments. The 
latter hypothesis was supported by the results of a 
recent study of bilingual children (Bentin & Bork, 
unpublished). The results of that study showed 
that learning to read Hebrew improved perfor- 
mance on segmentation tests in English only 
about half as much as in Hebrew. 

The significant influence of the process of 
reading acquisition on the development of 
phonological awareness should not, however, be 
interpreted as evidence against the importance of 
phonological awareness on readi^ig acquisition. In 
fact, several studies revealed that improving 
phonological skills in kindergarten has a positive 
influence on reading acquisition (Bradley, 1989; 
Bradley & Bryant, 1983; 1985, Bentin & Leshem, 
in press; see also Perfetti, Bade, Bell, & Hughes, 
1987; Vellutino & Scanlon, 1987; for a recent 
review see Goswami & Bryant, 1990). Moreover, 
the significant age effect that was observed in the 



present study suggests that some forms of 
phonological awareness is achieved in 
kindergarten and is independent of formal reading 
instruction. 

These data suggest that cognitive-linguistic 
skills that are necessary for achieving phono- 
logical awareness mature by the age of six, 
promoted by natural development and/or informal 
linguistic experience. It is possible that this 
maturation is a necessary condition for reading 
acquisition in the first grade to trigger phono- 
logical awareness. 

Hie significant within-grade (age) effect is more 
difficult to interpret Obviously, this effect can 
be due to spontaneous cognitive maturation. 
However, maturation is not the only possible ex- 
planation. Six years old children are not only one 
year older than five years old children but also 
more experienced in areas that might be relevant 
to phonological awareness. Although in Israel 
formal instruction in the kindergarten does not 
include learning the alphabet, the children are in- 
formally exposed to orthographic symbols while 
watching TV, street signs, etc. The amount of in- 
formal experience vnth letters is proportional to 
age. Therefore, the within-grade increase in 
phonological awareness observed in the present 
study might reflect the increased linguistic expe- 
rience rather than ^ure'' cognitive maturation. In 
other words, both the ""grade level* and the ""age 
level* effects in the present study might laave been 
mediated by the same underlying factor, the 
amount of experience with printed language. 
Hence, the difference between the two effects 
might reflect the difference between formal read- 
ing instruction and informal experience with 
printed language. 

Before concluding, one caveat should be 
considered. In the present study, we tested 
phonological awareness by tests of phonemic 
segmentation. Other studies suggest that the 
present results might not be valid for other tests 
of phonological awareness. For example, syllabic 
segmentation ability was quite good in 
kindergarten (Bentin & Leshem, in press, 
Liberman et al., 1974) and that sensitivity to 
rhymes and alliterations develops naturally 
between the age three and five, before the 
children can read (Maclean, Bryant, & Bradley, 
1987). Different effects of literacy on phonemic 
and syllabic or sub-syllabic segmentation was 
found also in illiterate adults (Bertelson & de 
Gelder, 1989; Bertelson, de Gelder, Tfouni & 
Morais, 1989). That study showed the illiterates 
performed reasonably well in tests of vowel 



ERLC 



155 



146 



Bentinetd, 



deletion and rhyme judgment, but poorly on 
consonant deletion. On the bans of their findings, 
Bertelson et aL, (1989) propose that phonological 
awareness is a heterogenous meta-linguistic abil- 
ity that involves Evolve separate components 
which obey different developmental mechanisms * 
Considering the existing pattern of evidence 
including our own, we adhere to this proposition. 
We suggest that sensitivity to highly resonant 
vocalic centers that form syllabic nudei develops 
naturally during speech perception. On the other 
hand, explicit deciphering of coarticulated 
individual phonemes and ability to consciously 
manipulate phonemic segments is significantly 
enhanced by learning to read an alphabetic 
orthography. 
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Bi-alphabetism and the Design of a Reading Mechanism"^ 



Laurie ItetU Feldmant 



Evidence for alphabetically-defined visual effects was examined in six word recognition 
studies with Serbo-Croatian materials. In each, the experimental manipulation exploited 
the bi-alphabetic fluency of skilled adult readers and compared performance on a varie^ 
of measiires using successive presentations of words and pseudowords in the same or in 
different alphabets. One line of investigation manipulated nimiber of intervening items in 
a repetition priming version of the lexical decision task. A second line of investigation iised 
alphabet decision as a study phase prior to lexical decision. A third examined lexical 
decision and naming latencies to targets in phonologically and graphemically similar and 
dissimilar (prime) contexts. In none of these studies did alternating as contrasted with 
preserving sdphabet exert a significant effect on word recognition. Three additional related 
lines of inquiry examined the effect of slphaibetic context on words that are phonologically 
ambiguous because they can be interpreted as either Roman or Cyrillic letter strings and 
on words that are phonologically unambiguous because they can be interpreted in only one 
way. Alphabetic context inflxienced the processing of phonologically ambiguouj words but 
not of unambiguous words both when the availability of the context was restricted, either 
in its duration or by the presence of a pattern mask, and when it was not. It was concluded 
that, alphabetically*defined visual effects in Serbo-Croatian word recognition reveal 
themselves under condition* of phonological complexity. Results are described in tenns of 
a connectionist model with letter-, phoneme- and word-sized units where alphabetic effects 
ar se in the mapping between letter and phoneme levels. 



The linguistic conditions in regions of 
Yugoslavia provide an ideal medium in which to 
investigate the role of a word's visual form in the 
process of word recognition. Specifically, two vi- 
sually distinct alphabets, Roman and Cyrillic, are 
used interchangeably and with impressive fluency 
by most skilled readers in the Belgrade region. 
Consequently, words of Serbo-Croatian, the offi- 
cial language of Yugoslavia, can be written in ei- 
ther the Roman or the Cyrillic alphabets and, ac- 
cording to the educational policy in effect until re- 
cently, all school children are required to demon- 
strate and maintain proficiency in both alphabets. 
The implication of the forgoing is that skilled 
readers of Serbo Croatian maintain two visually- 
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defined lexicons or at least, two visually-defined 
descriptions for each word. And, because most of 
the phonemes are unique to one alphabet or an- 
other, the visual similarity of the two alphabetic 
transcriptions of a word is dramatically reduced 
relative to the experimental manipulations of vi- 
sual form (e.g., case) that are possible in English. 
In addition, the writing system for Seibo-Croatian 
was reformed in the last century so that the map- 
ping of letter to sound is consistent and regular. 
The implication of a phonologically-regular writ- 
ing system is that skilled readers of Serbo- 
Croatian need never rely on word-level knowledge 
in order to arrive at the correct phonemic form of 
a word. 

The present chapter summarizes six lines of 
investigation using variations on the lexical 
decision and naming methodologies that were 
conducted with bi-alphabetically fluent readers of 
Serbo-Croatian (see Table 1). Collectively, they 
investigate the role of an alphabetically-defined 
(visual) level of description in word recognition. 
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All of the studies exploit the particular relation 
between two alphabets that exists in Yugoslavia 
and all were conducted ^th first year students at 
the University of Belgrade or with advanced high 
school students in the Belgrade region who are 
fluent in both alphabets. Studies One and Two fo- 
cus on the visual distinctiveness of orthographic 
forms. Specifically, most phonemes of Serbo- 
Croatian have two quite distinct visual forms, one 
Roman character and one Cyrillic character, and 
this variation provides a tool ¥dth which to ask 
whether multiple presentations that preserve 
alphabetically-defined visual patterns facilitate 
performance relative to presentations that 
alternate alphabet. Study three examines 
facilitation due to visual and phonological simi- 
larity for words presented close in succession. The 
remaining three studies exploit properties of the 
subset of characters that are shared by the two 
alphabets. Specifically, there are a small number 
of phonemes where the mapping between letter 
and phoneme is complex because the same visual 
characters are shared by both alphabets. Of these 
shared characters, the common diaracters (i.e.. A, 
E, O, J, K, M, T) receive the same phonemic 
interpretation in both alphabets whereas the 
ambiguous characters (i.e., B, C, H, P) represent 
different phonemes in Cyrillic and in Roman (see 
Table 2). Comparisons between words composed 
exclusively of shared diaracters (i.e., words with 
two phonemic interpretations) and words that 
include at least one nonshared (i.e., alphabetically 
xinique) character provide the basis of studies 
four, five, and six where the effect of alphabetic 
context on phonological processing is explored. To 
anticipate, this chapter will review a series of 
studies that explores the graphemic and phonemic 
implications of reading in two alphabets and will 
provide a model of word reading in Serbo-Croatian 
with its emphasis on phonology. Because the first 
two studies are not published and details are not 
easily obtained, they will be described in more 
detail than will subsequent studies. 

Study 1: Alphabetic manipulations across 
repetitions of a word 

One way in which the bi-alphabetic fluency of 
readers of Serbo-Croatian has been exploited has 
been to investigate the role of alphabetically-de- 
fined orthographic similarity of prime and target 
in repetition priming (Feldman & Moskov^evid, 
1987, Expt. 1). In this task, words and pseu- 
dowords are presented twice, with a lag of inter- 
vening items, and subjects are instructed to per 



form a lexical decision to each letter string as it 
appears (Stanners, Neiser, Hemon & Hall, 1979). 
The critical experimental manipulation entailed 
repetitions in either the same or in different al- 
phabets. In the alphabet alternated condition, 
prime and target were transcribed in different al- 
phabets (e.g., NOGOM-NOGOM). In the alphabet 
preserved condition, prime and target were in the 
same alphabet (e.g., NOGOM-NOGOM). Equal 
numbers of words and pseudowords were pre- 
sented for durations of 750 ms. The interval be- 
tween successive presentations of a word averaged 
10 items with a range of 7 to 13. One group of sub- 
jects saw all items in Roman script (alphabet pre- 
served) and the other saw primes in CyrilHc and 
targets in Roman (alphabet alternated). Results 
indicated that facilitation (i.e., reaction time to 
first minus second presentation) was numerically 
equivalent (viz., 90 ms) in the alphabet preserved 
and the alphabet alternated conditions. The au- 
thors interpreted this pattern of results as evi- 
dence that at lags of 7 to 13, visual similarity of 
prime and target alone did not provide a source of 
facilitation in the repetition priming task. 

Because it is possible that the time course of 
activation of visual form varies with lag (Monsell, 
1985; Ratcliff, Hockley, & McKoon, 1985), the first 
study attempted to replicate this finding. In 
addition, consistency of alphabet was 
systematically manipulated. Decision latencies to 
targets that were preceded by primes (where 
target and prime either alternated or preserved 
alphabet) were compared over lags of 10 and 20 
(Experiment la) or lags of 3 and 10 (Experiment 
lb) in an attempt to find evidence for facilitation 
based on repetitions of specific visual patterns. 
Materials consisted of thirty two Serbo-Croatian 
words and thirty two pseudowords. Words were 
familiar nouns in nominative case that contained 
three or four letters. Pseudowords were generated 
by changing one or two letters (vowel with vowel 
or consonant with consonant) and preserved 
orthographic and phonemic regularity. 

Each word and pseudoword appeared two times, 
once as a target and once as a prime and, as noted 
above, the lag or interval between presentation of 
prime and its target was varied. Half of the tar- 
gets were printed in upper case Roman and half 
were printed in upper case C3rrillic. And, at each 
lag, half of the prime-target pairs alternated al- 
phabet and half preserved it. Items were selected 
so that both alphabet transcriptions included at 
least one letter that uniquely specified alphabet 
(Feldman, Kostie, Lukatela, & Turvey, 1983). 
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A small number of filler items were introduced to 
maintain the appropriate lags. Across test orders 
each target (word or pseudoword) was preceded by 
its prime at two different lags in both the alphabet 
alternated and alphabet preserved conditions. 
University students were tested individually in a 
lexical decision task. Each subject viewed one test 
order and a practice list of ten items preceded the 
test list 

In addition to eliminating errors and extreme 
response times* responses were excluded when a 
subject responded incorrectly to one member of a 
prime-target pair. Table 3 summarizes the mean 
recognition times over subjects for target words 
and pseudowords as a function of lag for 
alphabetically alternated and preserved pairs. 

Analyses of variance on targets from 
Experiment la with lag (10» 20) and alphabet 
(alternated, preserved) as independent variables 
were performed separately for words and 
pseudowords using subjects (Fl) and items (F2) as 
random variables. For pseudowords, no effects or 
interactions were significant. For words, the effect 
of alphabet was marginally significant in the 
analysis of latencies by subjects Fl(l,35)s 4.08, 
MSe= 967, p <.051 but did not approach 
significance in the analysis by items. Neither the 



effect of lag nor the interaction of alphabet by lag 
wes significant Similarly with errors, no main 
effects or interactions approached significance. 
Finally, the pattern observed with errors did not 
support the latency pattern. 

When lags of 3 or 10 items separated prime and 
target, analyses performed on latencies for taiget 
items alone revealed no significant effect of 
alphabet, no effect of lag and no interaction. 
Analogously, the error scores were not sensitive to 
manipulations of lag or alphabet The interaction 
of lag by alphabet was not significant for 
pseudowords. 

The present study exploited the bi^alphabetic 
knowledge of Yugoslav readers in order to 
investigate the role of visually-defined similarity 
as a source of facilitation in the repetition priming 
paradigm. In contrast to the design used in 
Feldman and Moskov^evid (1987), the present 
design treats alphabet consistency of prime and 
target as a within-subjects variable. Two 
experiments were conducted and, across 
experiments, the average lag was manipulated. In 
neither experiment was the effect of lag 
significant for words. Neither at a lag of 3 nor at a 
lag of 20 did facilitation differ significantly from 
lag 10. 



Table 3. Mean decision latencies (ms) and errors for words and pseudowords in the alphabet preserved and alphabet 
alternated conditions of the repetition priming task for Study L 



Fiist 
Presentation 



Repetition Alphabet 



Alternated 



Preserved 



Difference 



Experiment la 
words 



pseudowords 



651 
12.7 



666 



10 

20 

10 
5.7 

20 



601 
6.6 

607 
45 

6S2 
5.2 

672 
4.2 



592 
73 

595 
73 

680 
5.9 

661 
5.9 



9 

12 
-2.8 

2 

^.7 

11 
-1.7 



Experiment lb 
words 



pseudowords 



628 



654 
5.7 



3 

10.8 
10 

3 

6.9 
10 



562 
83 

567 
7.9 

672 
7.6 

648 

6.1 



562 

5.9 

573 
7.9 

665 

629 
6.9 



0 

2.4 

-6 
0 

7 

^.7 

19 
^.8 
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The main finding was that for both words and 
pseudowords^ si^ificant target facilitation 
occurred when primes appear in either the tame 
alphabet or in a different alphabet from the 
target. Importantly, target facilitation wag no 
greater in the alphabet preserved condition than 
in the alternating condition. Small numerical 
difFerences that were Bometimes observed with the 
latency measure were not supported by the error 
measure. The intent of the alphabet decision 
study was to demonstrate an effect of prior 
experience with specific visual forms of words and 
pseudowords on subsequent lexical decision 
performance with those same materials. The 
experiment exploited a special characteristic of 
Serbo-Croatian, notably the multiple mapping 
from phoneme to graphemes that exist because 
readers are fluent in both the Roman and Cyrillic 
alphabets. Facilitation defined either in terms of 
the difference between first and second 
presentations or as a percent decrease in lexical 
decision latency (relative to the first presentation) 
were not significantly different for alphabet 
preserved and alphabet alternating conditions. 

Words presented and represented in the same 
alphabet are more visually similar than are the 
Roman and Cyrillic transcriptions of a word. Yet, 
in the repetition priming task where several items 
intervened between first and second presenta* 
tions, no significant increment to facilitation was 
observed on the alphabet preserved trials relative 
to the alphabet alternating trials. This outcome is 
not surprising if, as Masson and Freedman (1990) 
have claimed, visual analysis (e.g., improved per- 
ceptual sensitivity) is not responsible for the rep- 
etition effect (p. 356) but rather, the bases of facil- 
itation for repeated items are more conceptual in- 
terpretive processes that are associated with a 
shift in decision bias. Perhaps, because of the na- 
ture of the experimental task, an anal3r8is of the 
alphabet manipulation within a repetition prim- 
ing task cannot provide compelling evidence for 
the role of visual analysis and orthographic repre- 
sentations in word recognition. 

Study 2: Alphabetic manipulations in a 
alphabet decision Sask 

The pattern of facilitation in the repetition 
priming task mth a within-subjects manipulation 
of alphabet provided no evidence that, in the 
course of visual word recognition., subjects are 
constrained by an orthographic representation 
based on the visual form of the letter string. 
Although the previous task did not foster a visual 
analysis, it is plausible that skilled readers of 



SeTfoo-Croatian who are fluent in two alphabets 
can, under the proper circumstances, engage in an 
analysis of a letter string that retains its visual 
characteristics and this is the focus of the second 
study. In the first phase of study 2, subjects were 
told to attend to the alphabetic characteristics of 
the letter stings that they encountered. They were 
instructed to indicate the alphabet in which each 
letter string was printed by a key press. In a 
second phase, they were asked to make a lexical 
decision to those same letter strings. The goal was 
to try to induce subjects to attend to the visual 
attributes of the materials that they encountered 
in an attempt to demonstrate that skilled readers 
of Serbo-Croatian can attend to the visual 
characteristics of a letter string. 

Forty-four first year students from the 
Department of Psychology at the University of 
Belgrade participated in €tie experiment. Half of 
the subjects participated in an alphabet decision 
task and then in an lexical decision task. The 
remaining half participated only in the lexical 
decision task. Experimental targets consisted of 
forty Serbo-Croatian words and forty 
pseudowords. Words were familiar nouns in 
nominative case that contained three or four 
letters. As in the previous study, pseudowords 
were generated by changing one or two letters 
(vowel with vowel or consonant with consonant) 
and preserved orthographic and phonemic 
regularity. In both the alphabet decision and the 
lexical decision phases of study 2, half of the 
words and half of the pseudowords were printed in 
Roman and half were printed in Cyrillic. Items 
were selected so that both alphabet transcriptions 
included at least one letter that uniquely specified 
alphabet 

As each letter string appeared on the CRT of an 
Apple II in the alphabet decision task, subjects 
pressed either of two telegraph keys with both 
hands to indicate alphabet. In the second phase of 
the experiment, the same words and pseudowords 
were presented to subjects in a different order. 
Subjects performed a lexical decision to each letter 
string. The presentation format was identical to 
the alphabet decision phase described above. 
Reaction time was measured fi'om the onset of the 
letter string. 

In the lexical decision phase, as in the alphabet 
decision phase, half of the items were in C3rrillic 
and half were in Roman and words and 
pseudowords were equally represented in each 
alphabet In the lexical decision phase, however, 
half of the words and half of the pseudowords 
preserved the alphabet of their earlier 



ERLC 



16^ 



Bi-dlphabelism and the Design of a Raiding Mechanism 



153 



presentation and half alternated alphabet. In this 
study, alphabet (preserved or alternated) and 
lexicality (word or pseudoword) were manipulated 
within subjects and prior participation in the 
alphabet decision task was manipvQated between 
subjects. Results revealed a significant effect of 
prior alphabet decision on performance in the 
lexical decision task. Subjects who participated in 
lexical decision following alphabet dedtion were 
significantly slower than subjects who 
participated only in the lexical decision task. This 
outcome is consistent with the observation that 
repetition effects are sensitive to the task at 
initial presentation and do not always reveal 
themselves as facilitation (Forster & Davies, 1984; 
Ratclifif et al., 1985; Bentin & Peled, 1990). 
Subsequent analyses were conducted on the 
lexical decision following alphabet decision data. 

Mean latencies and error scores for the lexical 
decision phase are summarized in Table 4. (Scores 
greater than 1200 ms or less than 400 ms were 
treated as errors and eliminated from the reaction 
time analyses.) An analysis of variance on 
latencies revealed a significant effect of lexicality 
Fi(l,21)= 14.98, MSe= 1110, p <.001; 2^2(1,78)= 
10.16, MSe= 3147, p <.003. Neither the effect of 
alphabet nor the interaction of lexicality by 
alphabet approached significance. No effects were 
significant with errors as the dependent measure 
and the small numerical differences diverged in 
direction from the small latency differences. 

Table 4, Mean decision latencies (ms) and errors for 
words and pseudowords in the lexical decision phase of 
the alphabet derision task. 





Alternated 


Alphabet 
Preserved 


Difference 


words 


712 


702 


10 




3.9 


4.6 


0.7 


pseudowords 


737 


732 


5 




23 


3.9 


4.6 



The intent of the alphabet decision study was to 
demonstrate an effect of prior experience with 
specific visual forms of words and pseudowords on 
subsequent lexical decision peribrmance with 
those same materials. By using both Roman and 
C3nrillic characters, orthographic form was either 
preserved or alternated across the alphabet and 
lexical decision phases of the study. The logic of 
the first phase of the study was to direct subjects 
to attend to alphabet and their accuracy levels 



proved that they could do this. The effect of 
attending to alphabet on subsequent word 
recognition was then examined. 

Relative to performing a word level task in 
isolation, subjects were slower when they 
performed a letter level task such as alphabet 
decision prior to performing a word level task. The 
analysis of decision latencies in the second phase 
revealed a significant effect of lexicality on 
decision latency but no effect of alphabet. With 
respect to visual effects, viewing a word or a 
pseudoword t¥dce in the same visual form 
(alphabet preserved) exerted no effect over and 
above the effect of viewing a word (or a 
pseudoword) once in its Roman form and once in 
its Cyrillic form (alphabet alternated)- Moreover, 
the small numerical differences that were 
observed with the latency measure for the factor 
of alphabet were not supported by the accuracy 
measure. It appears that for recognition tasks at 
the level of tlie word, skilled readers of Serbo- 
Croatian, who tend to be equally fluent in both 
alphabets (Feldman & Moskovljevid, 1987 footnote 
1), caimot benefit from multiple presentations of 
alphabet-specific orthographic forms. 

In a repetition priming task (study 1) and in an 
alphabet decision task that explicitly directed 
sldlled readers to attend to alphs^t (study 2), no 
effects of orthographic repetition were observed. 
While this is a null effect and it is possible that 
another task will be developed in which effects of 
alphabet-specific orthographic form can be 
demonstrated, it is evident that in two quite 
different word recognition tasks skilled readers do 
not appear to rely on a style of analysis that is 
primarily tied to the visual form of a word. 

Study 3: Manipulations on alphabetic and 
phonemic similarity 

It is plausible that the experimental conditions 
in the first two studies where repetitions were 
separated by a number of intervening items could 
not reveal effects of preserving or alternating 
because the interval between successive 
presentations exceeded the duration over^hich 
alphabet effects can persist. Alternatively, or 
coAjointly, it is possible that no alphabetic effects 
were evident because all target items included at 
least one letter that uniquely specifies alphabet 
and alphabet effects emerge only when alphabet 
context is not well-spedfied. Accordingly, in a 
third line of investigation using a priming 
paradigm (Lukatela & Turvey 1990a), alphabetic 
effects at short lags are examined for target words 
that contain at least one unique letter. 
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In traditional priming paradigms, targets items 
are immediately preceded by a context or prime 
and in some experimental conditions, the context 
is related to the target along some dimension. 
Target latencies with and without related primes 
are compared. In contrast to the previous two 
studies where the first and second items were 
separated other intervening items, in the third 
study, phonologically unambiguous targets were 
immediately preceded by contexts. Moreover, 
these contexts were related with respect to the 
dimensions of phonology, graphemic form, both or 
neither and subjects performed either a naming or 
a lexical decision task. Primes and targets were 
displayed serially, one immediately after the other 
in a presentation format that was likely to 
enhance similarity effects between prime and 
taiget 

First item (prime) and second item (target) 
consisted of either words or pseudowords. Items 
were phonemically matched or mismatched and 
were visually similar (alphabet preserved) or 
dissimilar (alphabet alternated). Primes appeared 
above the position of targets and disappeared 100 
ms before the target was presented. Effects of 
phonological similarity were significant but 
direction varied with task. Visually similar primes 
had the same effect on target latencies as did 
visually dissimilar pairs in both the phonologically 
matched and the nonmatched conditions. For 
example, when primes and word targets differed 
in their initial phoneme and rhsrmed (i.e., 
phonologically similar condition), the difference 
between alternated (e.g., PAKUH-RACUN) and 
preserved alphabet (e.g., RAKUN-RACUN) 
latencies was 13 ms (0.22%) in lexical decision 
(Experiment 1; Lukatela & Turvey, 1990a) and 6 
ms (0.43%) in naming (Experiment 5; Lukatela & 
Turvey, 1990a). Similar effects were observed for 
pseudoword targets. Effects of alphabet in the 
phonologically unmatched conditions of those 
experiments were even smaller. Stated generally, 
in study 3, preservation or alternation of alphabet 
was used as a manipulation of visual similarity 
and no effect of alphs^tic similarity was observed 
for target letter strings that, because of the 
presence of at least one imique letter, were well- 
s^>ecified with respect to alphabet. Under 
sequential presentation conditions at inter- 
stimulus intervals of 100 ms there was no effect of 
graphemic similarity over and above the effect of 
phonological similarity. 

The present result contrasts to analogous 
experiments conducted with English materials 
where phonemic similarity effects are difficult to 



obtain (compare Martin & Jensen, 1988 with 
Hillinger, 1980 and Meyer, Schvaneveldt & 
Ruddy, 1974, for example). With Serbo-Croatian 
materials, a robust effect of phonemic similarity 
was observed in the lexical decision task. 
Moreover, the direction of this effect depended on 
the position of the nonmatched letter and on the 
relative frequent of the context and target word. 
Relative to a phonologically dissimilar context, 
target-context pairs that differed in their initial 
letter showed facilitation (+55 ms) whereas pairs 
that differed on a medial letter showed slowing (- 
27 ms) (Eb^riment 2; Lukatela & Turvey, 1990a). 
Pairs ¥rith low target familiarity (uncommon 
words and pseudoword targets) showed 
facilitation (+51 ms) whereas high familiarity 
(word) targets showed slowing (-21 ms) 
(Experiments 3 and 4; Lukatela & Turvey, 1990a). 

In the naming task, in contrast to the lexical 
decision task, facilitation due to phonological 
similarity was observed for both words and 
pseudowords with both initial and medial letter 
differences between context word and target As in 
the lexical decision task, alphabetically-defined 
visual effects were never significant. Target 
familiarity had no effect (Experiments 5 and 6; 
Lukatela & Turvey, 1990a) althoui^ in naming, 
differences in word stress between context and 
target eliminated the effect of phonemic similarity 
(Experiment 9; Lukatela & Turvey, 1990a). When 
targets were highly familiar words and contexts 
were either real words or pseudowords, 
facilitatOTy effects of phonological similarity were 
observed in naming for both word and pseudoword 
contexts (Experiment 1; Lukatela, Carello & 
Turvey, 1990). In lexical decision, by contrast, 
phonemically similar word contexts produced 
inhibiaon while phonemically similar pseudoword 
contexts produced facilitation relative to 
dissimilar pairs (Experiment 2; Lukatela, Carello, 
& Turvey, 1990). 

The effects of phonemic similarity of context and 
target were modelled as a network of letter, 
phoneme and word units such that constraints on 
the lexical der sion task arise primarily at the 
level of word tmits that are partially activated by 
the phonemic units activated by the context* In 
the course of partially activating word units 
similar to the target (which generate inhibition to 
the target), phonemically similar contexts ¥dll also 
enhance the activation of the letter and phoneme 
units which comprise the target (Lukatela & 
Turvey, 1990a)« In general, the dependence of 
context- target phonemic similarity on target 
familiarity in lexical decision reflects the balance 
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between inhibitory effects at the word level and 
excitatory effects at the letter and phoneme levels. 

By contrast^ the primary source of constraint on 
the naming task arises at the level of phonemic 
units which are sensitive to inputs from both the 
letter and word levels. The states and inhibitory 
relations amoni^ word units partially activated by 
the context are essentially irrelevant although, it 
is important to point out that naming a word 
benefits from activation of a word unit (and 
subsequent reinforcement of its phonemic 
constituents) in a way in which naming a 
pseudoword cannot. For letter strings that are 
well-specified with respect to alphabet, in both the 
lexical decision and naming tasks, effects of 
phonemic similarity arise from the use of 
phonological information activated by the context 
in the course of processing the target but no 
distinct influence of letter level (i.e., alphabetic ) 
activation on phonological activity is evident. 

Study 4: Alphabetic manipulations with 
phonological consequences 

A fourth and very productive line of investiga- 
tion into the consequences of two alphabetic sys- 
tems probes the status, for the skilled reader, of 
words that are composed exclusively of letters that 
are shared by both alphabets. Resists provide evi- 
dence of mandatory phonological processes prior 
to lexical access in word recognition tasks. As de- 
scribed in Table 2 (see also Turvey, Feldman, & 
Lukatela, 1980), some of these shared letters re- 
ceive the same phonemic interpretation in both 
alphabets whereas others are phonemically am- 
biguous in that they receive different interpreta- 
tions in Roman and in Cyrillic. Words composed 
exclusively of shared letters with the same 
phonemic interpretation in both alphabets (e.g., 
MAMA, JAJE) are alphabetically ambiguous but 
well-specified phonologically. Words composed of 
shared letters with two phonemic interpretations 
(and no alphabetically unique letters) are phono- 
logically as well as alphabetically ambiguous in 
that they can be pronounced according to the 
grapheme-phoneme correspondence rules of 
Roman or those of Cyrillic or by combination of 
the two. 

Consider the word BEHA which contains two 
phonologically ambiguous letters (viz., B, H) and 
two (alphabetically ambiguous but phonologically 
unique) common letters (viz., E, A ). Interpreted 
as a Cyrillic letter string, it is pronounced /vena/ 
which means Vein.' Interpreted as a Roman 
letter string, it is pronounced /bexa/ which is not a 
word in Seifoo-Croatian, although it is a phonolog- 



ically legal coxxibination. A firequently replicated 
finding is that when skilled readers of Serbo- 
Croatian are presented with phonologically 
ambiguous letters strings in either the lexical 
decision or naming tasks, their responses 
are significantly slowed relative to their response 
latencies for phonologically unambiguous letter 
strings. In one study, (Lukatela, Savic, 
Gligorijevid, Og^jonovid, & Turvey, 1978), both the 
design of the experiment and the instructions to 
the subjects were created to restrict the task to 
the Roman alphabet: No letter strings contained 
uniquely Cyrillic letters, and subjects were asked 
to judge whether a letter string was a word by its 
Roman reading. In a following study (Lukatela, 
Popadid, Ogxuenovie, & Turvey, 1980), no alphabet 
restriction was imposed on lexical decision and the 
word interpretation could occur in either the 
Roman interpretation, the Cyrillic interpretation, 
both or neither. In both experiments, the 
prolonged decision times to all phonologically 
bivalent letter strings as compared to 
phonologically unambiguous letter strings 
suggested that subjects are unable to suppress 
multiple phonological interpretations when 
permitted by a letter string. Because phono- 
logically unambiguous letter strings with and 
without alphabet ambiguity produced equivalent 
results (e.g., MAMA which can be interpreted as 
either a Roman or a Cyrillic word was no slower 
than 'ABA which can only be interpreted as a 
Cyrillic string), this outcome was interpreted as 
evidence of phonological as contrasted with 
alphabetic ambiguity and it was concluded that 
lexical access always proceeds with reference to 
phonology. 

A feature of the two experiments cited above 
(Lukatela et al., 1978; Lukatela et al., 1980) was 
that different words appeared in the phonologi- 
cally unique and phonologically ambiguous condi- 
tions. That is, the effect of a letter string's phono- 
logical ambiguity was assessed by comparing 
recognition latencies of different words, some of 
which were phonologically ambiguous and some of 
which were not Similarly, the effect of a letter 
string's alphabetic ambiguity was assessed by 
comparing recognition latencies of different 
(phonologically unambiguous) words, some of 
which were alphabetically ambiguous (e.g., 
MAMA, JAJE) and some of which were not (e.g., 
*ABA,ZABA) 

In a later experiment, the effect of phonological 
ambiguity was assessed by comparing decision 
(Feldman & Turvey, 1983) and naming (Feldman, 
1981) latencies to the ambiguous and unique tran- 
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Bcriptions of the tame word. For example, the 
Serbo-Croatian word for Nein* is, as noted al30ve, 
written BEHA in C3rrillic characters but, in 
Roman characters, tliat word id written VENA« 
Both forms are meaningfol and are equated with 
respect to variables such as frequenor, meaning 
and word length because they are forms of the 
same word. They differ, however, in that BEHA 
permits an alternative phonological interpretation 
(viz., /bexa/) whereas VENA does not. 
Comparisons between two alphabetic transcrip- 
tions of the same word, only one of which is 
phonologically ambiguous provide the basis of the 
within-word assessment of phonological complex- 
ity en word recognition in Serbo-Croatian known 
as the phonological ambiguity effect (PAE). 
Sometimes, differences as large as 300 ms have 
been observed between the ambiguous and unique 
alphabet transcriptions of a word althou^, amcmg 
other factors, the magnitude of the PAE difference 
is sensitive to the number of ambiguous charac- 
ters in the ambiguous form (Feldman, Kk)stid, 
Lukatela & Turvey, 1983; Feldman & Turvey, 
1983). PAE effects have also been observed for 
ambiguous letter strings where neither (Feldman 
& Turvey, 1983) or both of the readings are mean- 
ingful (Frost, Feldman, & Katz, 1990). 

To state the PAE outcome in a general way, 
prolonged latencies in naming and lexical decision 
have been observed for BEHA type words as 
contrasted with VENA type word but not for 
MAMA type words as contrasted with *ABA or 
ZABA type words. This outcome has been 
interpreted as reflecting activation of more 
phonemic units and competition among the word 
units to which they are linked (Feldman & 
Turvey, 1983; Feldman, Kostid, Lukatela & 
Turvey, 1983). A model, foreshadowed in the 
preceding discussion of phonemic and alphabetic 
similarity effects, has been proposed (I^ikatela, 
Turvey, Feldman, Carello, & Katz, 1989). It 
consists of three types of imits, letter, phoneme 
and word, and the linkages between them. At the 
level of the letter, the elements of the Cyrillic and 
Roman alphabets constitute functionally distinct 
sets. Shared letters with one phonemic 
interpretation (viz., A, E, O, J, K, M, T) are 
conunon to the two sets. Shared letters with two 
phonemic interpretation (viz., B, C, H, P) are 
represented in each alphabet set. That is, 
ambiguous letters are represented two times at 
the letter level. 

At the level of the phoneme, by contrast, there is 
no duplication. Two grapheme units link to each 
phoneme unit (except for the shared letters that 



have the same phonemic interpretation in two 
alphabets). For example, F and F both connect to 
/ff and B and V both connect to M whereas A, 
whidi is both a Cyrillic and a Roman diaracter, is 
the only unit that connects to /a/. The pattern of 
linkages between letter and phoneme units 
captures the relatively simple relation between 
letter and phoneme that characterizes the Serbo- 
Croatian language relative to a language such as 
English. 

In the proposed network, word units are 
activated from phonemic units in a two^^way 
interactive process. Each word unit represents a 
particular ordering of phonemic imits. When a 
word unit is activated, tiie units at the letter and 
phoneme levels for each letter position in that 
word are reinforced. It is also assumed that there 
are multiple inhibitory connections (in both 
directions) between the unique letters of one 
alphabet and the unique letters of the other. So, 
for example, when a unique Cyrillic letter is 
activated in one position, then the activity level of 
all Roman letters in complementary positions is 
reduced. The strength of inhibition varies as a 
function of the number of activated units that are 
unique to one alphabet. In a similar manner, the 
strength and pattern of activation that gives rise 
to PAE varies as a function of the number of 
ambiguous units that are present (Feldman et al., 
1983; Feldman & Turvey, 1983). 

Consider a word such as BEHA which has 
phonemically ambiguous letters in the first and 
third positions. Eadi of tiiese letters will activate 
two phonemic units (viz., B activates /b/ and M; H 
activates fx/ and /n/). Compare it with the Roman 
transcription of that same word, VENA, which has 
alphabetically unique letters in the first and third 
positions. The presence of unique Roman 
characters will decrease the activation of Cjrrillic 
alphabet units and the phonemic units activated 
by them. (For the two versions of this word, the 
number and identity of shared unambiguous 
letters is the same.) Activation at the phonemic 
level will feed to word level units where intralevel 
inhibitory influences will generate a complex 
pattern of excitatory and inhibitory influences. 
(Senerally, phonemic input from BEHA type words 
¥nll be enhanced relative to input from VENA 
type words, (see Figure la & b) And, in the 
terminology of interactive models such as 
McClelland and Rumelhatt (1981), phonologically 
ambiguous BEHA type words require more 
operational cycles to settle on a single word unit 
than do phonologically unambiguous VENA type 
words (Lukatela, Turvey, & Todorovie, 1991). 
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Strings that include a unique letter will, at the 
letter level, activate the alphabet of the unique 
letter and (partially) inhibit the letters of the 
complementary alphabet. Consequently, the 
composition of strings with ambiguous letters 
becomes less salient when the string includes a 
unique letter. For example, the Cyrillic word 
BEHI which is the dative case of the word 
meaning ^ein* includes a unique letter as its affix 
in ike word final position as well as ambiguous 
letters in the first and third positions (the first 
three letters comprise the base morpheme). When 
activated, Cyrillic I will reduce the potential 
activation of Roman letters in other positions, that 
is, the Roman reading of B and H (see Table Ic). 

In comparison with BEHA type words where 
both the Roman and the Cyrillic phoneme units 
are activated in both the B and H letter positions, 
the presence of I in BEHI type words will tend to 
excite the Cyrillic phoneme units of B and H and 
reduce activation of the analogous Roman units. 
Consequently, as activation spreads from the 
phoneme to the word level, the number of highly 
activated word units will be fewer for BEHI type 
words than for BEHA type words. Accordingly, 
lexical decision latencies should be faster for 
BEHI type words than for BEHA type words and, 
in fact, latencies for BEHI words were not 
significantly different than those of VENI type 
words (see Table ld).in a lexical decision task 
(Feldman et al., 1983; Feldman, 1991). Similar 
effects were also observed in a naming task 
(Feldman, 1991). 

TMt 5. Mean decision and naming latencies (ms) and 
errors for ambiguous and unambiguous base 
morphemes with ambiguous and unambiguous affixes 
(from Feldman, 1991). 



Base Moipheme 



Affix Ambiguous Unimbigoous Difference 



lexical decision 








ambigttous 


729 


671 


58 




2£ 


143 


13.7 


unambiguous 


611 


664 


13 






6.6 


2.6 


naming 








ambiguous 


616 


588 


28 




25.9 


125 


13.4 


unambiguous 


626 


613 


13 




17.6 


11.8 


5.8 



Stated generally, the presence within an iso- 
lated letter string of a single character that un- 
equivocally specifies alphabet can bias the activa- 
tion from letter to phonemic units. This outcome is 
significant in consideration of the three previous 
studies where alphabetic manipulations exerted 
no influence on the processes of word recognition. 
In the present study, it is evident that an effect of 
alphabet ambiguity reveals itself when a letter 
string contains no unique letters to guide alphabet 
identification. That is, alphabetically-defined vi- 
sual effects are linked to the phonological charac- 
teristics of a word and reveal themselves when a 
word is phonologically complex. In the last two 
studies, the domain of alphabet bias is investi- 
gated by manipulating the temporal relation be- 
tween a target and a context that indudes unique 
letters. Transient effects of alphabetically-speci- 
fied contexts on targets that are and are not com- 
prised exclusively of letters that are shared by 
both alphabets are examined. 

Study 5: Alphabetic manipulations on 
phonological ambiguity 

A fifth line of investigation into the effects of 
alphabetic bivalence on word recognition and 
hence a potential source of evidence for an 
alphabetically-spedfied orthographic contribution 
to word recognition entailed primed lexical 
decision and naming tasks. For target words 
consisting of phonologically ambiguous strings, 
plausible related contexts include what the words 
mean (viz., a semantic associate) and which 
alphabet yields a word interpretation (viz., 
alphabetically consistent) as well as a combination 
of the two. 

As described above, some words in Serbo- 
Croatian can be phonologically ambiguous in 
either their Cyrillic or their Roman form. For 
example, BETA? and PAJAC are both 
phonologically ambiguous because they are 
composed exclusively of letters that appear in both 
the CyrilUc and the Roman alphabet BETAP is a 
word by its Cyrillic reading (viz., /vetar/ which 
means ''vrind'') and is meaningless by its Roman 
reading (viz., /httapf). Conversely, PAJAC is a 
word by its Roman reading (viz., /pleats/ which 
means ''clown" ) and is meaningless by its Cyrillic 
reading (viz., /rigas/). More typically, however, 
words contain at least one letter that is unique to 
one alphabet or the other so that a transcription is 
well-specified with respect to alphabet and 
phonology (for example, the bold letters of VETAR 
pronounced /vetar/ and PAJAC pronounced 
/p^ats/ are unique to their respective alphabets). 
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In the one experiment (Experiment 2; Lukatela , 
Feldman, Turvey, Carello & Katz, 1989), targets 
(eitber ambiguous or unambiguous) were preceded 
by a prime that was either alphabetically 
consistent with the word reading of the ambiguous 
word or was alphabetically inconsistent with the 
word reading of that target. Primes were 
presented for 700 ms with an ISI of 100 ms before 
the target appearal for 1400 ms. All primes were 
semantically associated to the critical word 
targets. That is, BETAP (which means *S¥ind* by 
its Cyrillic reading) was preceded either by the 
word for ^storm,* written in Cyrillic characters 
(alphabetically consistent) or by the same word 
written in Roman characters (alphabetically 
inconsistent) and PAJAC (which means ''down" by 
its Roman reading) was preceded by the word for 
"circus," written in Roman characters or by the 
same word written in Cyrillic characters. 
Similarly, VETAR (which means **wind" by its 
Roman reading and cannot be read as C3rrillic) 
was preceded by the word for **storm," written 
either in Roman characters or in Cyrillic 
characters and PAJAC (which means *'clown" by 
its Cyrillic reading and cannot be read as Roman) 
was preceded by the word for ''circus," written 
either in C3nrillic characters or in Roman 
characters. 

Min F analyses conducted on word latencies 
between 1500 ms and 400 ms. revealed significant 
effects of (consistent/ inconsistent) alphabet 
context and of ambiguity as well as a significant 
interaction between the two. Alphabet 
inconsistency of prime and target slowed lexical 
decision to phonologically ambiguous 
transcriptions of words by 63 ms and hurt 
accuracy by 15.9% relative to the consistent 
condition. That is, phonologically ambiguous 
BETAP following **storm" printed in Cyrillic 
characters was faster and more accurate than 
BETAP following "storm" printed in Roman 
characters. For phonologically unique 
transcriptions of those same Wrds, however, 
alphabet consistency had a nonsi^ficant effect of 
12 ms on latency and 0.2% on accuracj'. For 
example, VETAR following "storm" printed in 
Roman diaracters was not significantly faster or 
more accurate than VETAR following "storm" 
printed in Cyrillic characters. 

The significance of this outcome vnth respect to 
understanding the effect of alphabetic context on 
word recognition is the observation that latency 
(and errors) for phonologically ambiguous words is 
dramatically affected by consistency of alphabetic 
context whereas no analogous effect of alphabet 



consistency was observed for phonologically 
unambiguous words. A similar outcome was 
observed in a naming task (Experiment 4; 
Lukatela , Feldman, Turvey, Carello & Katz, 
1989) where alphabet consistency of prime with 
the word reading of the target reduced latencies 
for ambiguous target words by 52 ms and 
improved accural by 4.6% but, for unambiguous 
words, alphabet consistency had a nonsignificant 
effect of 8 ms on latencies and 1.6% on errors. 

It is important to note that the specification of 
alphabet by a prior occurring context affects lexi- 
cal decision and npTning of phonologically ambigu- 
ous words not only when related word units ap- 
pear as primes but also when unrelated words and 
nonwords appear. In fact, the reduction in 
recognition latencies to ambiguous words in al- 
phabetically consistent contexts relative to alpha- 
betically inconsistent contexts was 86 ms when 
contexts were defined by unrelated words and was 
97 ms whan context was defined by a meaningless 
string of predominantly unique consonants 
(Experiment 1; Lukatela, Turvey, Feldman, 
Carello & Katz, 1989). As generally described, a 
context can bias but will not necessarily restrict 
processing to one alphabet. That is, all phonemic 
interpretations permitted by an orthographic 
string will be activated, at least partially. Finally, 
because words, both related and unrelated, as well 
as unpronounceable letter strings can serve as 
contexts, the effect of context on the activation of 
letter and associated phonemic units is imlikely to 
occur at the word level and more plausibly occurs 
at the linkage between letter and phonemic units. 

It is interesting to note that lexical effects can 
sometimes override the consistent biasing toward 
one alphabet over another. For example, the word 
meaning ^arem" can be ¥mitten as either XAPEM 
which is a Cyrillic form or as HAREM which is a 
Reman form but the combination HAPEM is 
meaningless. If the Roman and Cyrillic 
interpretations are assigned independently for 
each of the two ambiguous graphemes, this 
meaningless string can be pronounced in four 
different ways. One combination is of particular 
interest: By treating the H grapheme as Roman 
and the P grapheme as Cyrillic, the word meaning 
*harem" can be produced from HAPEM. This 
response constitutes a virtual word. In a lexical 
decision task, error rates for pseudowords with 
this structure (i.e., virtual word responses) 
averaged 42% when they were presented in the 
context of an unassociated word and increased 
significantly to 60% in the context of a word that 
was associated (e.g., the word for ''sultan") to the 
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xaixed alphabet reading of this string. And» in the 
context of an assodmted prime, correct rejection 
latencies were slowed by 23 ms relative to the 
unassodated context (Experiment 4; Lukatela, 
Turvey, Feldman, Carello & Katz, 1989). In the 
naming task, 43% of responses to these strings 
were interpreted as words in the unassociated 
context and that percentage increased to 63% in 
the associated t:ontext. Similarly, latencies for 
virtual words named as words were 32 ms faster 
in the associated context than in the unassodated 
context (Experiment 5; Lukatela» Turvey, 
Feldman, ^-^rollo & Katr, 1989). Evidently, 
influences of word level activation on activation at 
the phonemic level can offset the inhibition of 
letters belonging to the alphabet not specified by 
context That is, in both Uie lexical dedsion and 
the naming task, word level processes can 
contribute to the pattern of activation in that 
under some drcumstances skilled readers will 
activate both alphabets in order to interpret a 
pseudoword as a word. 

Alphabet contexts that are consistent across 
prime and target fadlitate recognition of ambigu- 
ous target words and sometimes they have a nu- 
merically small and statistically nonsignificant ef- 
fect on imambiguous target words (Experiment 3; 
Lukatela, Turvey, Feldman, Carello & Katz, 
1989). The proposed interpretation of this finding 
is that the effect of context is to help disambiguate 
the mapping between letter and phoneme levels. 
An alternative interpretation is that context could 
serve to fadlitate some later postlexical process. 
Accordingly, as processing of tiie context becomes 
progressively less complete, either in terms of the 
number of levels stimulated or in terms of the 
number of elements processed at one level, then 
strate^iic and postlexical processing suffers most. 
By this reasoning, if alphabet biasing is automatic 
and prelexical, then effects should not vary under 
experimental conditions that encourage incom- 
plete as contrasted with relatively complete pro- 
cessing of alphabetic information. Alternatively, if 
alphabet biasing is subject to postlexical strategies 
and checks then the effect of alphabet may not be 
evident under conditions that render the context 
less available. 

Study 6: Manipulations of alphabetic 
accessibility 

In principle, alphabetic contexts could exert 
their influence either early or late in the 
recognition process. A final methodology for 
examining the locus of influence of alphabetic 
context entailed visual presentation conditions in 



which the availability for processing of alphabetic 
context was varied by following it with a mask 
(Lukatela et al.^ 1991). As in the studies described 
above, subjects were required to name 
phonologically ambiguous target words in either 
Roman or Cyrillic alphabet prime contexts. In one 
experiment, contexts consisted of 3-5 unique 
letters which were presented for 70 ms and were 
followed after an ISI of 30 ms by a target In this 
nonmasked condition, results replicated the 
typical effect of alphabet consistency on naming 
whereby subjects were 131 ms faster ( and 32% 
more accurate) when both the context and the 
prime were in the tame alphabet than when they 
were in different alphabets (Experiment 1, 
Lukatela et al«, 1991)* Similar results were 
obtained both when the context duration was 
reduced to 18 ms and was preceded at an ISI of 0 
ms by a masking pattern (Experiment 2) and 
when the context consisted of a single unique 
letter (Experiment 4)« Evidently, it is not the 
lexical property of the prime that governs its 
ability to influence the activation of graphemic 
and phonemic units. 

Effects of alphabetic context have also been 
observed when the context follows the ambiguous 
target, and is itself masked so that identifying the 
alphabetic context and working from there to the 
target is highly implausible* It is claimed that if 
processing of the target is disrupted differentially 
according to linguistic properties of the masked 
context (and figural properties are held constant), 
then properties of the masked context must 
r:r.tribute to lexical access for the target and 
cannot simply influence postlexical processes. In 
one experiment (Experiments 5; Lukatela et al., 
1991), targets consisted of phonologically 
ambiguous letter strings and their unambiguous 
alphabet controls and contexts consisted of strings 
of unique consonants (some of which were 
repeated) printed in the alphabet that was either 
consistent or inconsistent with the word reading of 
the ambiguous letter string. Phonologically, all 
alphabetically consistent and inconsistent 
contexts were equivalent. Targets appeared for 40 
ms and were followed at an ISI of 0 ms by a 
context letter string. The context was presented 
for 40 ms and was followed by a series of hash 
marks that remained until the onset of the next 
trial. In that experiment, consistent with previous 
studies, the difference between correct target 
identification with alphabetically consistent and 
inconsistent contexts was 6.98 % for ambiguous 
targets and 1.46 % for unambiguous targets. This 
interaction was statistically significant and was 
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interpreted as evidence tliat alphabet congniity 
between target and masked context reduces (and 
alphabet incongruity augments) the disruption to 
processing caused by the mask« Because badcward 
pattern masks are assumed to interfere with 
lexical access, these results were interpreted as 
prelexical in locus. That is, the benefit associated 
with alphabetically consistent contexts and 
targets arises at the level of letter as contrasted 
with word units. 

An interesting prediction that follows from the 
claim that alphabet effects arise as inhibition at 
the level of letter units is that when target and 
subsequent pseudoword mask differ mth respect 
to alphabet, the letters of the mask will be 
activated relatively slowly because they must 
overcome prior inhibition from the target. As a 
consequence, the phoneme units of the target will 
have more time to activate possible word units. Of 
course, the effect of masked pseudowords will also 
be affected by the phonemic similarity of target 
and mask. 

Phonology and alphabet of target and masked 
prime were manipulated in a backward priming 
paradigm in which a phonologically unambiguous 
word target (20 ms) was followed by a pseudo- 
words mack (20 ms) and then by a pattern mask 
(Lukatela & Turvey, 1990b). Effects of phonologi- 
cal similarity were replicated. Moreover, a signifi* 
cant interaction of phonology and alphabet was 
obtained. As anticipated, for phonologically 
dissimilar pairs, alphabetically mismatched 
targets and masks were identified significantly 
more accurately than matched pairs. For 
phonologically similar pairs, there was a 
nonsignificant trend in the opposite direction. The 
effect of phonological properties of the mask on 
target identification suggests enhanced activation 
of phonemic units activated while processing the 
target. The interaction suggests a transient 
inhibition of letter units due to alphabetic status 
of the mask. That is, under veiy restricted viewing 
conditions, alphabetic context can influence the 
identification of unambiguous letter strings in a 
manner not unlike its influence on ambiguous 
letter strings. 

CONCLUSION 

In six word recognition studies using variations 
of the lexical decision and naming tasks evidence 
for alphabetically-defined visual effects was 
examined. The experimental manipulation 
common to all studies exploited the bi-alphabetic 
fluency of skilled readers of Serbo-Croatian and 
entailed a comparison of presenting context and 



target strings (or successive presentations of word 
or pseudoword letter strings) in either the same or 
in different alphabets. It was observed that 
relative to alternating alphabet, the preservation 
of alphabet over successive presentations of a 
word had no significant effect on recognition. 
Effects of alphabetic context were evident for 
target strings that were phonologically ambiguous 
and were typically slow in both the lexical decision 
and naming tasks, however. The presence of a 
letter unique to one alphabet, either in the target 
string itself or in a prior or later-occurring context 
was sufficient to diminish and sometimes to 
eliminate any significant effect of phonological 
ambiguity. This series of results was interpreted 
as evidence of mandatory phonological processing 
in Serbo-Ooatian word recognition and suggested 
a processing architecture efficient at handling two 
sets of mappings between letter and phoneme. 

In the experimental literature^ phonological 
effects are sometimes interpreted as postlexical 
effects and sometimes interpreted as occurring 
prior to lexical access. Phonological effects in 
Serbo-Croatian have been interpreted as 
reflecting early processes for several reasons 
inclv ding the findings that they occur for both real 
word and ortfaographically legal but meaningless 
pseudoword targets and that the alphabetic 
context need not be fully processed in order to 
influence processing of the target. That is, 
immasked as well as masked alphabetic contexts 
have similar effects on phonologically ambiguous 
letter strings and alphabetic contexts can be 
words, pseudowords or a single letter. For 
phonologically tmambiguous letter strings, effects 
of alphabetic context are rare. Finally, rates of 
target identification under alphabetically matched 
and mismatched conditions with phonologically 
mismatched masks suggest that the time course 
for effects of alphabetic context on unambiguous 
strings exist but may be quite transient. 

The proposed model of a reading medianism for 
the skilled reader of two alphabets entails letter, 
phoneme and word units. Effects in lexical 
decision are constrained primarily by activity at 
the word level whereas naming is constrained 
primarily by activity at the phonemic level. Effects 
of alphabet arise relatively early in the model and 
tend to be graded in nature. For example, 
inhibitory connections between alphabets exist at 
the letter level so that within a word, activation of 
a letter unique to one alphabet will tend to reduce 
the level of activity of letter units in the 
alternative alphabet. Similarly, the influence of a 
context that specifies alphabet is to bias the 
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connectionB between letter and phoneme toward 
the designated alphabet In sum, the processing 
microstructare for word recognition in Serbo- 
Croatian includes principles whereby inhibitory 
connections exist between the letter units of the 
two alphabets and the systematic covariation of 
letters and phonemes within each alphabet is 
realized. Evidence for alphabet affects at the word 
level are not typically observed. 
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Morphological Analysis of Disrupted Morphemes: 

Evidence from Hebrew 



Laurie Beth Feldmant and Shlomo Bentintt 



In concatenated langtiages such as English^ the mor^^emes of a word are linked linearly so 
that words formed from the same base morpheme also resemble aadi other along ortftographic 
dimensions. In Hebrew, by contrast the morphemes of a word can be but are not generally 
concatenated. Instead, a pattern of vowels i» Infixed between the consonants of the root 
morpheme. Consequently, the shared portion of mocphologically-related words in Hebrew is 
not always an orthographic unit. In a scries of three experiments using the repetition priming 
task with visually-prescnted Hd)rew materials, primes ttiat were formed from the same base 
morpheme and were morphologically-related to a target facilitated target recognition. 
Moreover^ moiphologically-relat^ prime and target pairs ^t contained a disruption to the 
shared ordiographlc pattern showed the same pattern of facilitation as did nondisn^tecl pairs. 
That is, there was no effect of disn;q>ting, over successive prime^and target presentations the 
sequence of letters tiiat constitutes the base morpheme or root. In addition, facilitation was 
similar across derivational, inflectional and identical primes. The conclusion of die present 
study is that morphological effects in word' recognition are distinct from effects of shared 
structure. 



The internal structure of a word plays a key role 
in its recognition. Whereas mudi work on visual 
word recognition has focused on phonology, more 
recent efforts have focused on aspects of 
morphology. One experimental task that is 
sensitive to the morphological components of 
words is repetition priming. Significant facilita- 
tion among visually-presented morphologically 
related words in the repetition priming variant of 
the lexical decision task is well documented 
(St&nners, Neiser, Hemon, & Hall 1979). 
Generally, responses to targets that are formed 
around the same base morpheme as their 
(morphologically-related) primes are faster and 
more accurate than to targets following unrelated 
primes. Sometimes, the facilitation with 
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morphological relatives as primes is equivalent to 
the effect of an identical repetition of the target. 
Sometimes, it is numerically reduced relative to 
identical repetitions but is still statistically 
reliable (Fowler, Napps, & Feldman, 1985). 
EflTects of morphological relatedness with visually 
presented materials in the lexical decision task 
have been found across a variety of languages in- 
cluding Serbo-Croatian (Feldman & Fowler, 1987), 
English (Feldman, 1991a; Fowler et al., 1985) and 
Hebrew (Bentin & Feldman, 1990) as well as 
American Sign Language (Hanson & Feldman, 
1989; see also Emmorey, 1989). At lags larger 
than zero or if more than a few seconds separate 
the second presentation from the first, the pattern 
of facilitation due to morphological relatedness is 
distinct from the pattern du*) to semantic associa- 
tion (Bentin & Feldman, 1990; Dannenbring & 
Briand, 1982; Henderson, Wallis & Knight, 1984; 
Napps, 1989). At average lags of 10 items, ortho- 
graphic similarity of morphologically unrelated 
prime and target (e.g., pairs such as DIET and 
DIE) produces neither facilitation nor inhibition 
(Bentin 1989; Feldman & Moskovljevie* 1987; 
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Hanson & Wilkenfeld, 1985; Napps & Fowler, 
1987). In short, the repetition priming procedure 
is a viable tool for studying how the morphological 
relation among words is represented in the lexicon 
and how that relation distinguishes itself from 
other types of similarity. 

An examination of morphologically complex 
words across languages reveals two basic linguis- 
tic principles by which sudi words are constmcted. 
In one, discrete morphemic constituents are 
linked linearly. There is a base morpheme to 
whidi other elements are appended so as to form a 
sequence. This principle defines a concatenative 
morphology, of the kind characteristic of English 
and Serbo-Croatian, for example. In languages 
with a concatenative morphology, suffixes and 
prefixes are regularly appended to the base mor- 
pheme in a manner that preserves its phonological 
and orthographic structure. According to the other 
principle, morphemic units are not just appended 
to a base form, but also modify its internal struc- 
ture. This principle defines a nonconcatenative 
morphology of the kind found in Hebrew, for ex- 
ample (McCarthy, 1981). 

In the repetition priming studies of morphologi- 
cal processing conducted with visually-presented 
English &nd Serbo-Croatian materials described 
above, primes and targets were typically con- 
structed around the sam.e base morpheme and 
only differed with respect to affix. As a result, 
among morphological relatives, the base mor- 
pheme remained intact and unchanged. 
Exceptions consist of studies that explored effects 
of changed spelling and/or pronunciation among 
morphologically related pairs (e.g., HEAL and 
HEALTH or SLEEP and SLEPT) at long lags 
(Fowler et al., 1985; Staimers et al., 1979; see also 
Kempley & Morton, 1982), studies that examined 
spelling and sound changes among morphological 
relatives at varying short lags and SOA*s (Napps 
& Fowler, 1987) and a study with German mate- 
rials that examined umlaut changes (Schriofers, 
Friederici, & Graetz, 1992). Even in those studios, 
however, the changes introduced to the base mor- 
pheme were relatively minor (e.g., consisting of a 
vowel or a vowel plus consonant change) as com- 
pared to the portion that was preserved* The 
structure of materials in those studies ref^^s a 
general principle of construction for languages 
with a concatenative morphology. That is, when 
morphemes are concatenated it is almost always 
the case that the phonological and orthographic 
structure of the base morpheme will be preserved 
among regular morphological relatives (but see 
Kelliher & Henderson, 1990). A morpheme tends 



to be a sequence of consonants and vowels that 
forms a syllable (or several) and concatenative 
word formation processes do not disrupt the co- 
herence of the morpheme. The implication of this 
is that in concatenated languages such as English, 
morphological relatives will tend to have se- 
quences of letters in common. As applied to the 
construction of materials in the typical repetition 
priming task where morphologically-related pairs 
are formed by adding a suffix, the initial portion of 
primes and targets will tend to be identical. 

Nonconcatenative formation processes are less 
likely to preserve the integrity of the base 
morpheme. The base morpheme in Hebrew is an 
abstract form which is called the ^oot* and is 
comprised of a string of three (or four) consonants. 
The root is not a complete phonological unit as it 
includes no vowels. Superimposed on the root is 
the Srord pattern* which consists primarily of 
vowels. The root together with a word pattern 
constitute the word. Some word patterns consist 
exclusively of vowels and typically, the vowels are 
infixed between the consonants of the root Other 
word patterns include a consonant prefix (e.g., M 
plus vowel) or a suffix (e.g., vowel plus T) as well. 
Both the word pattern as well as the root are 
productive and convey morphological and 
semantic information (Ornan, 1971). For example, 
the root SH-M*N can take many word patterns 
including -e-e- to form the noun /Semen/ (which 
means ^oiD, and -a-e- to form the adjective 
/Samen/ (which means **fat*). Similarly, the root Z- 
M-R can take many word patterns including -a-a, 
•e-e-, and -i-e-. Note that roots such as Z-M-R and 
SH-M-N are productive in that they generate 
several words in the semantic fields related to 
singing and oil respectively. Similarly, the word 
patterns are productive and tend to modify the 
root in systematic ways (Herman, 1978). For 
example, the -a-a- word pattern tends to denote 
an agent, the -e-e- pattern an object, and the -i-e- 
the past tense of an active verb in the third person 
singular. Thus, in Hebrew, /zamar/ meaning ""a 
singer,* /zemer/ meaning *a song,* and /zimer/ 
meaning %e sang* are all morphologically-related 
because they share the Z-M*R root and are all bi- 
morphemic because they include a word pattern 
as well as a root. 

It is useful to point out that when different roots 
accept the same word pattern, the semantic 
information carried by that word pattern is not 
fully consistent. Specifically, although, the word 
pattern -a-a- often denotes an agent, it is also 
sometimes ur^ to denote the past tense singular 
form of active verbs as well as some adijective 
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forms. Compare, for example, the contribution of 
the -a- a- pattern to the root Z-M-R (/zamar / 
meaning "singef') with its effect on the root L-V-N 
(/lavan/ meaning Vhite*). Similarly, the semantic 
contribution of the root is not consistent in any 
simple sense over all morphologically-related 
items. 

The principle of building words in Hebrew, in 
contrast to that of languages such as English and 
Serbo-Croatian, dictates that the phonological and 
orthographic similarity of morphologically-related 
words in Hebrew will be spread over several 
syllables. Root morphemes consist of a sequence of 
consonants and the requisite vowels for a 
particular word pattern are infixed between the 
consonants. Consequently, the root morpheme 
constitutes neither an orthographically nor a 
phonologically coherent whole. Rather than 
forming continuous imits, morphemes tend to be 
disn^pted and distributed over several syllables. 

Alternative accounts of morphological 
effects 

Accounts of morphological effects in word 
recognition often minimize the role of purely 
linguistic variables such as the morpheme and 
rely on orthographic and phonological patterning 
of letter units or on semantic similarity in 
conjunction with shared orthographic and 
phonological structure. For example, Seidenberg 
(1987) suggested that patterns of high and low 
probability of transition among sequences of 
letters could accoimt for (syllabic or) morphol- 
ogical patterning because transitional 
probabilities of letter sequences that straddle a 
(syllabic or) morphological boundary tend to be 
low (bigram troughs) relative to probabilities of 
sequences internal to a unit. In an illusory 
conjunction paradigm, subjects who tended to 
misidentify the color of the target letter were more 
likely to assign the color of another letter from 
within the same morphological unit than from an 
adjacent but different unit. Although this result 
provides support for orthographic (specifically, 
bigram) structure in a particular task, it does not 
negate the influence of morphology in word 
recognition. Recently, in fact, morphological 
effects have been demonstrated in a lexical 
decision task where color boundaries within a 
word were either consistent or inconsistent with 
morphological boimdaries (Rapp, 1992). Moreover, 
morphological boundary effects were evident both 
in words with bigrams troiighs at the boimdary 
and in words mthout troughs. Similar effects have 
also been reported for compound words 



(Prinzmetal, Hoffman, & Vest, 1991). Whether or 
not orthographic factors in morphological 
processing prove to be relevant for languages with 
concatenative morphologies such as English, it is 
difficult to see how they could be adapted easily to 
nonconcatenated languages such as Hebrew 
because morphemes are not always coherent 
units. In sum, the tendency to interpret 
morphological effects as orthographic patterning 
makes it essential to examine orthographic 
influences on morphological processing in a 
language in which the morpheme is not always an 
orthographic entity. 

The emphasis on orthographic patterning is also 
evident in morphological parsing models in which 
the affixes of a morphologically complex word are 
first eliminated and then the remaining portion of 
the letter string is matched to candidate entries in 
a lexicon (e.g., Tafl & Forster, 1975). Although 
affix parsing models may be plausible in 
languages such as English in whidx the repertoire 
of morphological affixes is relatively limited, their 
practicality is severely compromised in la ng ua g es 
with differing morphological structures (cf. 
Henderson, 1989). In Turkish, for example, 
sequences of morphological affixes may be 
appended to one root and the form of those affixes 
may vary due to phonological factors. Moreover, 
some affixes may be applied more than once. 
Consequently, a process of suffix stripping with 
subsequent analysis of the remainder may have tc 
undergo many iterations before the root can be 
successfully identified. It has been proposed th&t 
for Turkish, priority in morphological analysis of a 
word goes to the root and only then is its sequence 
of affixes identified. That process starts at the root 
and proceeds from left to ri^t (Hankamer, 1989). 
In contrast, morphological parsing in Hebrew 
poses special problems because the root morpheme 
constitutes neither a coherent phonological nor 
orthographic unit and morphological formation is 
less systematic 

In a study of morphological analysis using 
repetition priming with Hebrew materials (Bentin 
& Feldman, 1990), patterns of facilitation for 
prime-target pairs that were related by semantic 
association and by a shared (morphological) root 
were compared. The study exploited the fact that 
although words that are constructed aroimd the 
same root are, by definition, morphologically- 
related, the semantic relation among 
morphologically-related forms in Hebrew may 
vary dramatically. All of the morphological 
relations of prime and target pairs were 
derivational in nature. As a consequence, the 
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meaning of & derived form was not always 
predictable in any simple way from a semantic 
analysis of its component morphemes (see Aronoff, 
1976). Facilitation due to morphological 
relatedness was evident at lags that averaged ten 
intervening items. Moreover, facilitation was 
equivalent for semantically-close (e.g., kitchen- 
cook) and semantically-distant (e.g., slaughter- 
cook) prime-target relatives. That is, the 
magnitude of facilitation to the target meaning 
^cook" was equivalent following primes meaning 
^tchen" and ''slaughter.*^ These findings are, in 
fact, consistent with the claim based on English 
materials that at long lags semantic overlap 
between prime and target does not influence the 
magnitude of repetition priming (Feldman, 
1931a). In summary, when all related words were 
derivational in nature, and an average of ten 
items intervened between prime and target, 
facilitation due to morphological relatedness in 
the repetition priming task was not sensitive to 
the semantic similarity of prime and target. This 
outcome suggests that morphological analysis is 
not based on the semantic overlap of 
morphological relatives. 

Covariants of morphological stracture in 
Hebrew 

With respect to orthographic structure of words 
in Hebrew, it is important to note that most 
vowels are represented by optional diacritics 
placed beneath, above or within the preceding 
consonant although some vowels are represented 
by letters. Because words are conventionally 
written without vowel diacritics, morphologically- 
complex words that share a root morpheme but 
differ with respect to word pattern will tend to be 
orthographically but not phonologically 
indistinguishable. For example, the words /gever/ 
and /gavar/ are both written Hill (Note that in 
contrast to English and to phonemic notation, 
Hebrew is read from right to left) These words are 
morphologically related and mean ^an" and 
'^overcome," respectively. Because the word 
pattern is composed exclusively of vowels and 
because the /e/ and /a/ vowels are represented by 
optional diacritics, these two words have the same 
orthographic form as conventionally written. Of 
course, although both words have phonological 
forms that are created around the G-V-R root, 
their phonological forms differ because of the 
infixed vowels. By contrast, when vowels are 
written and particularly when one of them is 
represented by a letter, then the orthographic 
pattern of the root morpheme, like its phonological 



pattern is no long^ji coherent unit. For example, 
the sequence "lOTE^ is read /mi Jmar/, meaning 
''guard* whereas the sequence HQ^iVt, is read 
/Jdmer/, which means ''guardian*. These words 
are morphologically related as they share the root 
r-m-V (SH-M-R) They differ, with respect to 
phonological form, as well as orthographic form, 
however, because in one case the letters for the /of 
vowel of the word pattern is infixed between the 
consonants of the root. In the present study, we 
use patterns of facilitation for morphologically 
complex words in the repetition priming task to 
ask whether the morphological processing of 
disrupted roots as typicaUy occurs in Hebrew is 
similar to the processing of continuous roots as 
t^ically occurs in concatenated languages. 

Linguists distinguish between two types of 
morphologically complex words. Words that share 
a base morpheme but differ with respect to 
inflectional affixes are generally considered to be 
forms of the same word (e.g., CALCULATE, 
CALCULATED). Words that share a base 
morpheme but differ with respect to derivational 
affixes are generally considered to be different 
words (e.g., CALCULATE, CALCULATOR, 
CALCULATION). As a secondary objective in the 
present study, we use patterns of facilitation to 
ask whether inflectional and derivational 
formations are likely to involve distinct types of 
representations and/or processing. 

Experimental evidence for this linguistic 
difference has been difficult to obtain in English. 
One possible reason for the failure to find evidence 
for the linguistic distinction between inflectional 
and derivational formations is that the similarity 
of orthographic form cannot be equated in 
English. Specifically, because inflectional relatives 
and derivational relatives tend to differ with 
respect to length of afHx (or because the 
transitional probability from the final letter of the 
base morpheme to the initial letter of the affix 
differs for inflectional and derivational affixes), 
these comparisons are not appropriate. 

In Hebrew, by contrast, it is possible to identify 
pairs of words that are, respectively, 
inflectionally- and derivationally-related and are 
equated with respect to orthographic and 
phonological similarity to that target. By 
definition, all such words are morphologically 
related to each other because they are constructed 
around the same root morpheme. Words in a pair 
differ with respect to the word pattern but 
inflectionally^related and derivationally-related 
word patterns can be matched with respect to 
presence (and letter length) of prefixes and/or 
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suffixes. In this way, the structural similarity to a 
target of inflectional and derivational relatives 
can be matched so that types of morphological 
formations can be compared. 

To summarize, the primary goal of the present 
study was to examine the role of orthographic 
patterning in morphological analysis. Accordingly, 
using morphologically nonconcatenated Hebrew 
materials, the orthographic integrity of the base 
morpheme across morphological relatives was sys- 
tematically manipulated in the repetition priming 
task. Sometimes prime and target presentations 
preserved the same orthographic form of the root 
morpheme and sometimes they did not. A 
secondary goal of the present study was to 
compare facilitation by inflectional and 
derivational relatives. Primes and targets shared 
a common root and primes were either 
inflectionally- or derivationally-related or 
identical to the target. Lexical decision latent to 
the target was compared following 
morphologically-related and identical primes. A 
series of three experiments was conducted in an 
attempt to imcover the contribution to word 
recognition of orthographic similarity over and 
above that of morphological relatedness. 

EXPERIMENT 1 

Across languages, a variety of mechanisms for 
forming words exist, the most common being the 
addition or affixation of an element to a base 
morpheme (Matthews, 1974). Affixation includes 
three processes, defined by the position relative to 
the base morpheme, where addition occurs. These 
include prefixation, suffixation, and infixation in 
positions initial, final and internal to the base 
morpheme. Prefixation and suffixation entail the 
linear concatenation of elements, whereas 
infixation is nonconcatenative insofar as the 
integrity of the base morpheme is disrupted. As 
described above, the characteristic morphological 
process of Semitic languages, such' as Hebrew, 
relies on a skeleton of consonants into which a 
pattern of vowels is infixed (although a prefix or 
suffix may also be appended). The morphological 
system of Semitic languages is distinguished for 
its productivity, the manner in which semantic 
modification of the root occurs among complex 
forms that share a root, and for the 
nonconcatenativity of morphemes (Berman, 1978). 

As noted above, the orthographic integrity of the 
base morpheme is generally maintained in 
English and in Serbo-Croatian but not always 
preserved in written Hebrew. For processes of 



infixation, morphological changes typically entail 
appending different word patterns to a root, where 
the word patterns specify the requisite vowels of a 
word. When represented by a letter, vowels in the 
word pattern necessarily disrupt the sequence of 
consonants that coxnprise the root. Consider, for 
example, the words ^D9 and yS) and compare 
them with the target word ^Sl9. The target is the 
present tense of the verb ^ fall", in the third 
person singular (pronounced /nnfel/). The first 
form is inflectionally-related to the target and is 
pronounced /nafal/; it is the past tense of the same 
verb in the same person. The second form is 
derivationally-related to the target and is 
pronounced Aiefel/ which means ^a dropout*. By 
definition, all three forms are morphological 
related because they share the same root x43*9 
(N*F*L/. Note, however, that in the target word, 
the root morpheme is not continuous. It is 
disrupted by the vowel 0 /o/, which is part of the 
word pattern. Contrast this pattern with that for 
the words ^3 V and *T3V as compared with the 
target I The target is pronounced 

/avadim/, meaning ^slaves*. The first word is 
inflectionally-related to tlie target, is pronounced 
/eved/ and is the singular form ^slave.* The 
second word is derivationally-related to the target, 
is pronounced /avad/, which is the past tense, 
third person singular of the verb *to work.** Note 
that in this case, the orthographic root ^*!2*X7 
remains intact in all related forms. The 
orthographic similarity (due to preservation of the 
orthographic pattern for the root) of the 
morphological relatives depicted in the latter 
example is characteristic of all regularly-related 
pairs in English. In the present experiment with 
Hebrew materials, the pattern of facilitation due 
to morphological relatedness of prime-target pairs 
was compared when the orthographic form of the 
shared root was dimipted over prime and target 
presentations (e.g., ^-130) and when it was intact 
(e.g., 1-3-17). 

It has already been demonstrated that in 
Hebrew, facilitation in the repetition priming task 
is sensitive to derivational relatedness of prime 
and target (Bentin & Feldman, 1990). If 
inflectional and derivational formations in 
Hebrew are similarly represented in the lexicon 
then it is anticipated that the magnitude of 
facilitation in the lexical decision repetition 
priming task will not vary with type of 
morphological relation, and a comparison of 
inflectional and derivational primes is included in 
the present investigation. It is anticipated that if 
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orthogTaphic similarity of prime to target is 
independent of morphological relatedness then the 
pattern of facilitation for roots that are disrupted 
and roots that are not disrupted will not differ. 

Methods 

Subjects. Forty^eight first year students from 
the Department of Psychology at Hebrew 
University participated in Experiment 1. All were 
native speakers of Hebrew. All had vision that 
was normal or corrected-to-normal and had prior 
experience in reaction-time studies. None had 
participated in other experiments in the present 
study. 

Stimulus materials. Forty-eight Hebrew word 
triplets were constructed. Each included three 
forms: a target word, a word that was 
inflectionally-related to it and a word that was 
derivationally-related to it. All members of a 
triplet were constructed from tlue same root 
morpheme but they differed with respect to word 
pattern. The orthographic and phonemic overlap 
of morphologically-related words to their targets 
was systematically manipulated. Targets 
consisted of twenty-four verbs in present tense, 
third person singular and twenty-four plural 
nouns. For verb targets, the inflected forms were 
past tense formations (third person singular), and 
the derived forms were nouns in singular case. In 
the verb set, the roots were orthographically 
continuous in both inflected and derived forms, 
but the roots were disrupted in the target by the 
infixation of a letter vowel. For the noun targets, 
the inflected forms were the same nouns in 
singular and the derived forms were verbs in past 
tense (third person singular) In this set, die roots 
were orthographically continuous in targets as 
well as related forms. 

Four types of words preceded each target across 
experimental lists. Words inflectionally- and 
derivationally-related to the target, an identical 
repetition of the t-^rget and an (orthographically, 
phonologically and semantically) unrelated word 
served as primes The orthographic similarity of 
the derived and inflected primes to their target 
was matched within eadi triplet (All word triplets 
and their English translations are listed in 
Appendix A). I^e unrelated words had the same 
morphological structure (word pattern) as did the 
related words (for other targets) although they 
necessarily had different root morphemes. 
Ninety-six pseudowords were constructed by 

combining meaningless three-consonant root 
morpheme with real word patterns. Root 



morphemes in nonwords were not repeated over 
successive trials so as to enhance the orthographic 
salience of the words. 

Four test orders were assembled. Each list was 
comprised of 96 words and 96 nonwords. All items 
were presented with their vowels. The 96 words 
consisted of the 48 Urgets and their 48 primes. 
Twelve targets were preceded by identical repeti- 
tions, 12 targets were preceded by derivationally- 
related primes, 12 were preceded by inflectionally- 
related primes and 12 targets were preceded by 
morphologically, orthographically and semanti- 
cally unrelated word primes. The lag between 
prime and target varied between 7 to 13 items 
with an average of 10. The serial position of all 
target words and pseudowords was identical 
across test orders. Ihe primes were rotated among 
the four lists, so that within a list each type of 
prime was equally represented and, across lists, 
eadi target was preceded once by eadi of the four 
types of primes. 

Procedure. Twelve subjects were randomly 
assigned to each of the stimuli lists. Ihus, the four 
prime ^ypes were compared within subject!? across 
all 48 targets and within stimuli across all 48 
subjects. Speed and accuracy were equally 
emphasized in the instructions. 

The stimuli were presented approximately 80 
cm from the subject, at the center of a Macintosh 
monochromatic screen. Each item was exposed 
until the subject responded or for 2000 ms, 
whichever came first The interval between onset 
of successive stimuli was 2500 ms. 

The dominant hand was used for word re- 
sponses and the nondominant hand was used for 
nonword responses. Latencies were measured 
from stimulus onset, to the nearest millisecond us- 
ing a special software algorithm^ and errors were 
automatically registered. Following the instruc- 
tions, a practice list comprised of 24 items (two 
identity, two inflectional and two derivational 
prime-target pairs as well as 12 pseudowords) was 
presented. After a short pause, the experimental 
list followed in one block. The complete experi- 
mental session lasted about 20 minutes. 

Results and Discussion 

Lexical decision reactions times more extreme 
than two SD*s fr^m the mean for subjects and for 
items in each condition were excluded from all 
analyses. Fewer than 2% of all responses were 
eliminated by these constraints. Mean lexical 
decision latencies and errors in Experiment 1 are 
summarized in Table 1. 
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Tabk !• Mean lexical decision time and percent errors 
for targets following morphologically*related and 
unrelated primes words in Experiment 1 (SEm in 
parentheses). 







PRIME TYPE 






Unrelated 


Identity 


Inflection 


Derivation 


RT 


769 


701 


709 


710 




(14) 


(14) 


(14) 


(13) 


Enon 


0^ 


0.5 


0.4 


0.5 




(03) 


(0.2) 


(0.1) 


(0.2) 



The statistical reliability of the repetition 
priming effect was tested in each task by ANOVA 
with repeated measures across subjects (Fl) and 
across stimuli (P2). In the lexical decision task the 
effect of pr'«ne type was significant F 1 
(3,141)=20.72, MSe=:2279, p<.0001, and F2 
(3,141)=18*67, MSe=2704, p<.0001. Tukey-A post 
hoc comparisons revealed that whereas all prime 
t3rpes significantly facilitated lexical decision 
relative to the unrelated condition (p<.01), the 
magnitude of the effect did not differ from one 
type of prime to another. In particular, it was 
interesting that facilitation with identity primes 
was not significantly larger than with inflectional 
or derivational primes. 

The factors of Target continuity (disrupted, con- 
tinuous), and Prime type (unrelated, identity, in- 
flectional, derivational) were examined in an 
analysis of variance. This analysis revealed that 
the effect of target continuity was not reliable F2 
(1,46)=:0.43, MSe=17429, p>.50.i The effect of 
Prime type was significant but, as suggested by 
the absence of a reliable interaction between 
Prime type and continuity F2 (3,138)=0.44, 
MSe=2737, p>.50, facilitation from morphologi- 
cally-related primes to orthographically disrupted 
target did not differ from facilitation to ortho- 
graphically continuous targets.^ 

Tabk 2* Mean lexical decision latency in milliseconds 
(and SEm) for target words with disrupted and 
continuous roots following primes in the four priming 
conditions of Experiment L 





PRIME TYPE 




Unrelated 


Identity 


Inflection 


Derivation 


Disn^)ted 759 


699 


704 


709 


(19) 


(14) 


(14) 


(16) 


Continuous 785 


704 


715 


717 


(21) 


(13) 


(17) 


(15) 



The error rate on words was very low and did 
not differ as a function of prime type Fl (3,l4l)s 
0*24, MSe=0*4, p<.80. Due to the design of the 
experiment, facilitation due to repetition of 
pseudowords could not analyzed. 

Experiment 1 had three important outcomes: a) 
The magnitude of the facilitation in lexical deci- 
sion was similar for prime-target pairs whose 
structure preserved the orthographic continuity of 
the root and for those where the continuity of the 
root was disrupted by infixing an additional letter. 
It was the case that all the disrupted roots were 
embedded in verb targets whereas all the continu- 
ous roots were in nouns and that the derivation- 
ally-related primes (but not the inflectionally-re- 
lated primes) always introduced a change in word 
class between prime and target. Nevertheless, 
statistically nonsignificant and numerically small 
differences between facilitation by inflectional and 
by derivational relatives were obtained* This out- 
come suggests that the morphological repetition 
effact is sensitive neither to similarity of ortho- 
graphic form between the prime and the target 
nor to the similarity of word class, b) Significant 
facilitation for inflectionally- and derivationally- 
related as well as for identity primes was observed 
and provided further evidence for morphcl^cal 
analysis in Hebrew. However, the magnitude of 
the facilitation in lexical decision was not signifi- 
cantly greater for prime-target pairs related by in- 
flection than for pairs related by derivation. Thus, 
facilitation by repetition priming was not sensitive 
to the type of morphological relation, c) Finally, 
facilitation due to morphological relatedness in 
Hebrew cannot be attributed to repetition of an 
initial syllable. Although the initial consonant was 
always unchanged in prime and target, the follow- 
ing vowel did vary. Initial consonant and vowel 
overlap of prime and target was greater for inflec- 
tions than for derivations for the nondisrvipted 
targets whereas the vowel never overlapped for 
the disrupted targets. Nevertheless, the pattern 
was similar for both. 

In conclusion, the results of Experiment 1 repli- 
cate effects of morphological relatedness in the 
repetition priming task when the orthographic 
integrity of the base morpheme is preserved over 
prime and target and extends the outcome to 
cases where the continuity of the root morpheme 
is disrupted. In addition, it shows that the ten- 
dency for enhanced semantic overlap of inflec- 
tionally-related prime-target pairs relative to 
derivationally-related pairs contributes nothing to 
the pattern of facilitation. Collectively, these re- 
sults provide no behavioral evidence for a linguis- 
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tic distinction between morpholoincal types. 
Moreover^ it suggests that effects due to morpho- 
logical relatedness are not e(\nily interpreted as a 
composite of orthographic and semantic similarify. 

EXPERIMENT 2 

The results of the previous experiment revealed 
morphological analysis in the lexical decision task. 
Evidently, subjects were sensitive to repetitions of 
a sequence of consonants that comprises a root 
morpheme whether or not they form an ortho- 
graphic unit Importantly, inflectional relation- 
ships and derivational relationships produced the 
same pattern of facilitation. We assume that this 
outcome can be interpreted as a failure to find 
evidence for a psychological distinction between 
morphological types in H^ew. Aspects of stimu- 
lus construction in Experiment 1 permit an alter- 
native account, however. 

In Experiment 1, all nonwords were constructed 
from a meaningless string of consonants combined 
with a real word pattern and all words 
(necessarily) consisted of a meaningful root 
combined with an appropriate word pattern. 
Therefore, in order to perform the lexical decision 
task successfully, it was not logically necessary for 
subjects to attend to the whole word: an analysis 
of the root would have been sufficient. 
Consequently, it is possible that the failure to 
observe a difference between morphologically- 
related primes with inflectional and derivational 
word patterns reflected the tendency of sutoects to 
ignore perceptually nonsalient vowel information 
in this experimental setting. It was essential to 
show that subjects were, in fact, sensitive to the 
word patterns that create the distinction between 
inflectional and derivational formations and this 
was the intent of the second experiment 

In Experiment 2, the informativeness of word 
pattern information was enhanced by constructing 
pseudowords along a different principle. Here, 
pseudowords consisted of a real root and a real 
word pattern in an illegal combination. The words 
consisted of the same items as in the previous 
experiment. The differentiation between word and 
pseudowords therefore required the subject to 
process the word pattern as well as the root. As in 
the previous experiment, word targets were 
preceded by identity, unrelated, inflectionally- and 
derivataonally-related primes. 

Method 

Subjects. Forty-eight first year students from 
the Department of Psychology at Hebrew 



University participated in Experiment 2. As in the 
previous experiment, all were native speakers of 
Hebrew. All had vision that was normal or cor- 
rected-to-normal and all had prior experience in 
reaction-time studies although none participated 
in other experiments in the present study. 

Stimulus materials. The words used in the 
present experiment were identical to those used in 
Experiment 1. There were 48 sets, eadi comprised 
of a target, an unrelated word, a derivationally- 
related word, and an inflectionally-related word. 
Half of the targets contained orthographically 
continuous roots and half contained roots that 
were disrupted by the infixation of the vowel /o/ 
which is represented by the letter O. Both 
inflectional and derivational primes always 
included the full root morpheme in a continuous 
form, and primes were matched for orthographic 
similarity with the target 

The 96 pseudowords were constructed using 
other productive roots exist in the language. 
All roots were combined with l^al word patterns 
sudi that the particular combination of root and 
word pattern was meaningless. For example, the 
root'T-Zl-ST (A*V*D) was combined with the word 
pattern -d-a-ut in order to form the phon ologically 
legal but meaningless structure H^STllV which 
is pronounced /ovdanut/. This manipulation was 
introduced so as to promote morphological 
analysis of all letter strings. 

The four test orders created for Experiment 1 
were modified so that a new set of nonwords was 
substituted for the old set. In all other respects 
the materials were identical to those of the 
previous experiment 

Procedure. Subjects were instructed to make a 
lexical decision judgment. The procedure as well 
as the word stimuli were identical those that of 
Experiment 1 except that the timing software was 
measured from s hardware device that eliminated 
the constant that had been added to each laten<^ 
in the previous experiment 

Results & Discussion 

Mean lexical decision latencies were calculated 
in each condition, across subjects and across 
stimuli. Errors and extreme reaction times were 
eliminated according to the constraints described 
for Experiment 1. Mean reaction times and errors 
for eadx conditions are presented in Table 3. 

The comparison of the latencies of lexical 
decisions to target words in the different 
conditions was based on ANOVA using subjects 
(Fl) and stimuli (F2) as random factors. 
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Table 3. Mean lexical decision times in milliseconds 
and percentage of errors for morphologically-related 
and unrelated target words in Experiment 2 (SEm in 
parenthesesh 







PRIME TYPE 






Unrelated 


Identity 


Inflection 


Derivation 




628 


571 


376 


578 




(12) 


(6) 


(6) 


(10) 


EiTon 


1.9 


1.4 


L9 


IS 




(0.4) 


(03) 


(0) 


(0.4) 



This analysis showed a significant effect of type 
of prime [Fi (3,U1)=9.98, MSe=1826,p<.0001 and 
F2 (3,141)=20.71, MS€=1744; p<.00011. Post hoc 
Tukey-A comparisons of the means indicated that 
the inflectional, derivational, and identity primes 
all facilitated lexical decision relative to the 
unrelated condition, (p<.01). As in Experiment 1, 
the magnitude of facilitation was similar for the 
three related prime types. The analysis of error 
scores showed no significant difference due to type 
of prime Fl (3,141)=1.49, MSe=1.54, p>.14. 

The responses to targets with orthographically 
disrupted roots and targets with continuous roots 
were compared by a mixed model ANOVA and are 
summarized in Table 4. Targets with continuous 
roots were marginally faster than (different) 
targets with disrupted roots F2 (1,46)=3.13, 
MSe=:8787, p<.084. The effect of prime type was 
reliable and, consistent ¥dth the outcome of 
Experiment 1, there was no interaction between 
type of prime and target continuity F {3,138)=0.78, 
MSe=2459, p>.501. Because the pattern of 
facilitation was similar for targets with 
orthographically disrupted and orthographically 
continuous roots, these data support the 
conclusion of Experiment 1 that preservation of 
orthographic pattern is not a necessary condition 
for facilitation due to morphological relatedness. 
Finally, neither for disrupted roots nor for 
continuous roots were inflectionally-related 
primes and derivationally*related primes 
significantly different from each other. 

The outcome of the present experiment 
replicated that of Experiment 1. The magnitude of 
facilitation in lexical decision was not significantly 
greater for prime-target pairs related by inflection 
than for pairs related by derivation. More 
important, neither was facilitation influenced by 
the orthographic integrity of the repeated root 
morpheme. Ilius, even when the composition of 
pseudo words forced subjects to analyze the 



morphological structure of the items in order to 
perform the lexical decision task, facilitation in 
morphological repetition priming was not 
sensitive to a) type of morphological relation nor 
to b) preservation (or disruption) of an 
orthographic pattern for the morpheme across 
prime and target words. 

Tabic 4. Mean lexical decision latency in milliseconds 
(and SEm) for target words following primes with 
disrupted and continuous roots in the four priming 
conditions of Experiment 2. 





PRIME TYPE 




Unrelated 


Identity 


Inflection 


Derivation 


TARGET 








Diin^>ted 640 


575 


584 


595 


(22) 


(7) 


(9) 


(17) 


Continuous 602 


563 


568 


561 


(11) 


(9) 


(7) 


(10) 



Plausible accounts of facilitation in the 
repetition priming task have identified response* 
related (episodic) as well as lexical influences (e.g., 
Bentin & Feldman, 1990; Bentin & Moscovitch, 
1988; Bentin & Peled, 1990; Forster & Davis, 
1984; Mcnsell, 1985). One account of the present 
results places the Iccus of facilitation at the level 
of the root morpheme that is repeated in both 
inflectional and derivational pairs. Perhaps 
repetition servc!^ to facilitate the identification of 
an orthographically and semantically abstract 
root within the composite root plus word pattern 
that constitutes a word. Coi^jointly, facilitation 
may reflect that was present in our previous 
experiments. It was the case that lexical decision 
response to a root was also repeated. That is, roots 
that were parts of words on their first 
presentation were parts of words on their second 
presentations. It never was the case that roots 
that were parts of pseudowords on their first 
presentation were parts cf words on their second 
presentation. This redundan^r between roots and 
responses might have facilitated the decision 
process or the selection between the word and not 
a word response categories, thereby introducing 
an additional source of facilitation. In the third 
and final experiment, the lexical decision 
associated with a particular root was manipulated 
over repetitions. The experiment was designed in 
order to identify an episodic component of 
facilitation associated with response repetition. 
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EXPERIMENTS 

Pseudoword itructure influences rejection time 
in the lexical decision task« Caramazza, Laudanna 
and Romani (1988) reported that Italian 
pseudowords composed of illegal combinations of 
real morphemes were harder to reject than 
pseudowords composed of one legal morpheme and 
one illegal (nonmorpheme) sequence. Similar 
results have been reported in English (Katz, 
Rexer, & Lukatela, 1990). Of course, Italian is a 
concatenated language like English and 
morphemes consist of uninterrupted sequences of 
letters whereas in Hebrew the morpheme root is a 
more abstract unit. Experiment 3 assesses 
whether Hebrew pseudowords words formed 
around a meaningful root pose special problems 
relative to pseudowords formed around a 
meaningless string of consonants. Hebrew 
pseudowords constructed by combining 
meaningful root morphemes with real word 
patterns were compared with pseudowords 
constructed of a meaningless root with a real word 
pattern. 

Experiment 3 also attempts to evaluate 
response repetition as a source of facilitation in 
this task. In the experiments reported above as 
well as in all previously reported repetition 
priming studies, the lexical status of the prime 
and the lexical decision to the target were 
matched so that if the answer to the first was 
^word* then the answer to the second would also 
be *word* and if the answer to first was 
^pseudoword' then the answer to the second 
would also be ^pseudoword.* In the present 
experiment, the effect of morphologically-related 
pseudoword primes on word targets was 
investigated. That is, primes and targets were 
always formed around the same root but, due to 
illegal combinations of root and word pattern, the 
lexical status of the prime was not always a real 
word. Failure to find facilitation when the lexical 
status of prime and target is not matched would 
provide evidence for a response-related component 
to fadlitaticn in the repetition priming task. 

The addition of a condition in which pseudoword 
primes are followed by word targets serves to 
eliminate another potential problem of 
interpretation. In the previous two experiments, 
only words were repeated so that it was possible 
that subjects used repetition of the root as a 
criterion for deciding the lexical status of a letter 
string. That is, if a particular string of consonants 
had been presented previously then respond 
""word.* By this ficcount, Urget faciliUtion 
following unrelated primes would be over 



estimated as these were first presentations of that 
consonant string. Accordingly, targets following 
pseudoword primes formed from the same root 
should show facilitation because the root is 
repeated. By contrast, if targets following 
unrelated primes (with different roots) and targets 
following pseudoword primes (repeated roots) do 
not differ significantly, then it is unlikely that 
suhjects are exploiting repetition of the root per se 
as a basis for judging the lexical status of a target. 

Experiment 3 was designed to differentiate the 
effect of repeating a root morpheme from the effect 
of repeating a lexical decision response (ef. Logan, 
1989). If facilitation following morphological 
repetition reflects units for accessing the lexicon 
rather than lexical processes, then target words 
that contain a root that was previously presented 
should be faster than targets whose roots were 
presented for the first time. Importantly, the 
lexical status of the word in which the root 
appeared should have no effect That is, both word 
and nonword primes that contain the root 
morpheme shoxild facilitate targets. On the other 
hand, if morphological components must activate 
a lexical entry in order to produce facilitation then 
roots embedded in pseudowords will not facilitate 
words with those same roots. Such an outcome 
could also suggest that relatively late processes of 
decision find response selection contribute to the 
pattern of facilitation in the repetition priming 
task. 

Method 

Subjects, Forty-eight first year students from 
the Department of Psychology at Hebrew 
University participated in Experiment 3. As in the 
previous experiments, all were native speakers of 
Hebrew. All had vision that was normal or 
corred:ed-to-normal and all had prior experience 
in reaction-time studies although none 
participated in other experiments in the present 
study. 

Stimulus materials. The materials from 
Experiment 1 were modified in the third 
experiment so that the response for a particular 
root was not necessarily constant over first and 
second presentations of that root. The materials 
for the Uiird experiment were identical to those of 
the previous two experiments with two exceptions. 
First, instead of including an identical repetition 
of each target word, a new prime was constructed. 
It consisted of an illegal combination of the target 
root and a word pattern. Aocordingly» the correct 
lexical decision response for these primes was not 
a word. As a consequence of introducing a new 
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principle for constructing primes, two types of 
pBeudowords occurred within each test order. One 
type consisted of pseudowords formed by creating 
an illegal combination of meaningful root and real 
word pattern. These were pseudoword primes for 
real word targets and twelve existed in each list. 
The other consisted of a real word pattern on a 
meaningless root and these were pseudoword 
fillers. Both types of pseudowords were presented 
to each subject so that they could be compared. 

Thus, the design of the Experiment 3 was 
similar to the design of Experiment 1, except that 
here the identity word primes were replaced by 
pseudoword primes constructed from the same 
root morpheme that appeared in the target. 
Within each of the four test orders, the forty-eight 
targets were preceded equally often by 
pseudowords, by inflected primes, by derived 
primes, and by unrelated word primes. Across test 
orders, each target was preceded by each type of 
prime. 

Procedure, Subjects were instructed to make a 
lexical decision judgment and the procedure and 
instructions were identical to those of the two 
previous experiments. 

Results and Discussion 

Mean decision latencies and error rates for 
Experiment 3 are summarized in Table 5. Errors 
and extreme reaction times were eliminated 
according to the same constraints used in previous 
experiments. 

The ANOVA of word latencies revealed a 
significant effect of type of prime [Fl (3, 141) = 
5.76, MSe = 2016, p<.001; F2 (3,141) = 11.21, MSe 
= 2482, p<.0001] althoTigh the analysis of error 
scores did not [Fl (3,138)= 1.13, MSe=1.14, p>.33]. 

Table 5. Mean lexical decision times in milliseconds 
(and SEm) and percentage of errors for 
morphologically-related, unrelated target words and 
for pseudowords in Experiment 3. 



faster than targets preceded by unrelated words. 
In replication of previous results, the magnitude 
of facilitation was similar for inflectional and 
derivational type primes. Reaction times to 
targets preceded by pseudoword primes were not 
significantly different from reaction times to 
targets preceded by unrelated primes, however. 
This outcome suggests that when the response to 
a root was not repeated, repetition of the root per 
ae was not sufficient to facilitate (or inhibit) lexical 
decision. This outcome is important because it 
suggests that word target responses were not 
simply facilitated because the same root was 
repeated during the experimental session. 
Facilitation necessitated activation of a lexical 
entry. 

Comparison of the meaningful root and 
meaningless root pseudowords revealed that the 
presence of a meaningful root delayed r^ections of 
pseudowords by about 200 ms. (698 ms vs. 902 ms, 
respectively). This difference was statistically 
significant [Fl(l,47)=88.5, MSe=11280, p<.00011 
and, is consistent with the results found in 
concatenated languages such as Italian and 
EngUsh. 

As in the previous experiments, latencies and 
errors to targets containing disrupted and 
continuous roots were compared. They are 
summarized in Table 6. The ANOVA showed that 
continuous and disrupted target types were not 
significantly different F 2 (1,46)=1.78, MSe=9960, 
p>.18. In replication of previous experiments, the 
effect of type of prime was significant but there 
was no interaction between t3rpe of prime and 
continuity 25^2 (1,138)=:0.88, MSe=2640, p>.44. 

Table 6. Mean lexical decision latency in milliseconds 
(and SEm) for target words with disrupted and 
undisrupted roots in the four priming conditions in 
Experiment 3. 

PRIME TYPE 
Unrelated Identity Inflection Derivation 



PRIME TYPE 



Unrelated 


Identity 


Inflection 


Derivation 


RTs (SEm) 653 


647 


608 


609 


(13.5) 


(10.1) 


(7.30) 


(7.60) 


ExTOTS 2A 


1.9 


1.7 


1.6 


(0.7) 


(0.5) 


(0.5) 


(0.4) 



For latencies, post-hoc Tukey-A revealed that 
targets preceded by inflectionally- and 
derivational]y*related primes were significantly 



TARGET 










EHsrupted 


634 


638 


609 


597 




(17) 


(14) 


(11) 


(9) 


Continuous 


661 


650 


613 


631 




(19) 


(14) 


(10) 


(12) 



The difference in the lexical decision latency 
between the two types of pseudowords suggests 
that during the process of lexical decision, roots 
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were examined and that readers cannot ignore the 
meaningfubiesB of the roots even when they are 
components of pseudowords. Nevertheless, the 
presence of a meaningful root in a pseudoword 
could not facilitate later lexical decision to a word 
formed ^m the same root. Because only the 
pseudoword-word combination was examined, this 
outcome could suggest that repetition of the 
episode or particular response is a source of 
faciliUtion in the repetition priming task. 
Alternatively, it is plausible that morphological 
components must activate a lexical entry in order 
to produce fadlitatiun at a later point. In any 
event, it appears that the locus of root facilitation 
cannot be prelexical. 

GENERAL DISCUSSION 
In a series of three lexical decision experiments, 
significant facilitation due to morphological 
relatedness of prime and tazget was observed with 
Hebrew materials. Subjects performed a lexical 
decision to both prime and target and 7 to 13 
items intervened between them. When related 
primes were matched for overall orthographic 
similarity to targets, facilitation by inflectional 
primes was equivalent to facilitation by 
derivational primes, both of which were 
statistically equivalent to facilitation by identical 
repetitions. Similar magnitudes of facilitation for 
the two types of morphological primes is 
interesting because forms related by derivation 
generally tend to be less similar in meaning than 
forms related by inflection (Aronoff, 1976). 
Moreover, in our particular experiments, pairs 
related by inflection were always of the same 
word-class whereas pairs related by derivation 
changed word class. Evidently, the facilitation 
that underlies repetition priming among 
morphologically-related forms cannot reflect 
preservation of shared meaning over prime and 
target. These results are consistent vnth the claim 
that at long lags, semantic relatedness per se is 
not a primary source of facilitation in the 
repetition priming task (Bentin & Feldman, 199C), 
and support a distinction between facilitation due 
to associative and morphological relatedness 
(Henderson, 1985). 

Alternative accounts of facilitation between 
morphologically-related prime-target pairs em- 
phasize the repetition of phonological and ortho- 
graphic patterns conveyed by a shared morpheme. 
As described above (see also Berman, 1971), and 
in contract to concatenated morphologies such as 
that of English, morphologically complex words in 
Hebrew consist of a root morpheme of consonants 



into which a word pattern is infixed. 
Consequently, root morphemes are abstract pat- 
terns that cannot be realized as unified phonologi- 
cal entities. In the present study» roots were re- 
peated over related prime and target but, because 
word patterns dianged, related words were not 
associated with a common phonological structure. 
Nevertheless, facilitation was observed. In conclu- 
sion, appreciation of morphological relatedness 
does not require phonological identity. As applied 
to the repetition priming task, repetition of a 
phonological unit is not necessary in order to pro- 
duce morphological facilitation. 

Accounts of morphological efTects that empha- 
size orthographic structure (e.g., Seidenberg, 
1987; Seidenberg & McClellend,1989) may be 
more appropriate for concatenated languages be- 
cause morphemes tend to be orthographic as well 
as linguistic units. For example, in English, the 
base morpheme is typically undisrupted by mor- 
phological manipulations.^ Nevertheless, previous 
studies in English have demonstrated that for 
morphologically-related words, the repetition of 
orthographic form plays only a minimal and sta- 
tistically insignificant role in the morphological 
repetition effect (e.g., Napps, 1939; Napps & 
Fowler, 1987). Similarly in Serbo-Croatian, facili- 
tation in repetition priming was numerically 
equivalent when prime and taiget were both writ* 
ten in the same alphabet (e.g., NOGOM - NOGA) 
and when prime was in one alphabet (e.g., 
nOgA) and target was in the other (e.g., NOGA) 
(Feldman & Moskovljevie, 1987; Feldman, in 
press). In Hebrew, the root is always phonologi- 
cally and sometimes also orthographically dis- 
rupted because of its nonconcatenated structure. 
The major contribution of the present result is to 
underscore the limitations of an orthographic ac- 
count of morphological analysis: This claim is 
based on the following e^vidence. 

First, the magnitude of target facilitation 
following morphological relatives was similar to 
that following identical repetitions although the 
orthographic similarity of the inflected and 
derived primes to their matched targets was, by 
definition, ^^ller than with identity primes. 
Second, in allthree experiments, the comparison 
between prime-target pairs with orthographic 
disruptions to the root and pairs with continuous 
roots yielded no significant differences. Moreover, 
in Experiment 3, repetition of the root did not 
facilitate lexical decision to the target if its first 
presentation was in the context of a pseudoword, 
even though the pseudoword was as 
orthographically similar to the target as were the 
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related words. These results are consisteat with 
the outcome of a similar study conducted with 
English materials (Fowler, et al., 1985) in that 
changes in spelling (and/or pronunciation) had no 
effect on the pattern of facilitation between 
morphologically-related prime*taTget pairs in the 
repetition priming task. The implication of the 
above is that facilitation in the repetition priming 
task in nonconcatenated as well as concatenated 
languages cannot be attributed to repetition of an 
overall orthographic form nor to preservation, 
over successive presentations, of the continuity of 
an orthographic pattern. In summary, 
morphological analysis is cannot be tied to 
orthographic units. 

Inflections and derivations are contrasted by 
linguists as representing two diflferent types of 
morphological formations. In English, inflectional 
affixes are few and tend to be composed of three or 
fewer letters whereas derivation^ endings can be 
composed of a more variable number of letters. 
Moreover, some derivations change the meaning 
and pronunciation of the base morpheme in a 
maimer that is not characteristic of inflections 
(Chomsky & Halle, 1968). In Hebrew, it is possible 
to Imd inflectional and derivational relatives of a 
target that modify the structure of the root to a 
similar degree although they necessarily differ 
¥rith respect to their semantic similarity to the 
target. In the present repetition priming study, no 
differences between inflectional and derivational 
types of morphological formations were observed. 
(Consistent with the conclusion of Napps (1989) 
and Napps and Fowler (1987), it is evident that 
facilitation due to morphological relatedness in 
the present study does not represent the 
convergence of semantic, orthographic, and 
phonological relationships. 

Locus of morphological effects 

In order to observe morphological facilitation in 
lexical decision, it is not necessary that 
orthographic pattern be preserved and this 
finding has been interpreted to mean that 
morphological analysis is not tied to an 
orthographic pattern. Similarly, facilitation 
patterns are not sensitive tn the semantic overlap 
of prime and target in either this or an earlier 
study (Fcldman, 1992). Because the morphological 
character of a word cannot be captured by its 
orthographic and semantic properties, it seems 
that the morphological structure in general and 
the Hebrew root morpheme in particular must be 
represented. A morphological representation in 



the lexicon has been proposed by several 
investigators (e.g., Grainger, Cole; & Segui, 1991). 

The claim that morphological effects in word 
recognition reflect lexical processes is based on 
several sources of evidence. Typically, effects of 
repeating a morpheme are numerically larger and 
statistically more robust for word than for 
pseudoword prime-target pairs. Significant 
facilitation for pseudowords in the repetition 
priming task is unreliable even when the negative 
lexical decision is repeated over prime and target 
with the same continuous base morpheme (e.g., 
Duchek & Neely» 1989; Feldman & Moskovljevid, 
1987)* For example, in the one repetition priming 
study where Hebrew pseudowords were repeated 
(Bentin & Feldman, 1990), evidence for 
facilitation due to repetition with pseudowords 
depended on the choice of a baseline. Similarly, in 
at least one study mth English materials (Fowler, 
et ah, 1985), evidence of facilitation with 
pseudowords depended on the number of items 
intervening between prime and target (see also 
Scarborough, Cortese, and Scarborough, 1986). 
For morphologically related word pairs, by 
contrast, effects tend to be larger in magnitude 
and manipulations of lag are not significant 
(Feldman, in press). The results of Experiment 3 
also cast doubt on a locus for the morphological 
facilitation that in independent of the lexicon* If it 
were possible for subjects to extract a root from 
both word and pseudowords prior to accessing the 
lexicon, then the effect on word targets of word 
and pseudoword primes should have been similar. 
Analogous effects for word and pseudoword 
primes were not observed, however. 

A second source of evidence that (at least some) 
morphological effects are lexical in origin is the 
interaction of morphological with frequency 
effects. Although it is not the case in repetition 
priming that (relative) frequency of 
morphologically-related prime and target had a 
significant effect (Feldman, 1992), morphological 
and frequency effects often interact in other 
recognition tasks. Accordingly, more frequent 
words are less sensitive to manipulations of 
morphological structure than are less frequent 
words. For example, in an experimental 
production task (Stemberger & MacWhinney, 

1986; 1988), the error rate on lower-frequency 
morphologically-comple7£ forms was significantly 
higher than on higher-frequency verb forms. 
Similarly, it has been suggested (Caramazza et 
al., 1985) that both whole word and morphological 
units may constitute viable units for accessing the 
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lexicon but that the availability of the former are 
constrained by the frequenpy of the particular 
•urface form. 

It is important to point out that the measure of 
variance included in Table 3 provides no evidence 
that performance was more variable in the 
pseudoword prime condition than in the unrelated 
prime condition. Therefore, an account based on 
compensatoiy processes such as fadlitatioo due to 
repetition of the root being offset by a change of 
response to that root seems implaiuible. 

Evidence that facilitation due to morphological 
relatedness is lexical in locus is compelling and 
fits well with the results of studies that used 
different experimental paradigms. What is less 
obvious is how to account for the effect of 
morphemic composition on pseudoword rejection 
latencies. Rejection latencies were prolonged for 
pseudowords that included a meaningful root 
relative to pseudowords that did not This outcome 
for real roots in illegal combinations with word 
patterns could reflect a relatively late and 
strategic re*evaluation of the decision process 
analogous to the spelling check necessary for 
pseudohomophone rejection. 

Kecently, Grainge/ et al. (1991) have identified 
two plausible lexical loci for morphological effects 
in word recognition. As usually conceived, 
morphological effects are interpreted as sublexical 
in origin so that morphological relatedness is 
represented as a system of faciHtatory connections 
between lexical entries for morphologically-related 
words or as a pattern of activation among 
morphological units at a level intermediate 
between word and letter level units. Whether 
interpreted as a system of connections between 
whole word forms or as patterns of activation 
among shared morphological units, the traditional 
locus of morphological relatedness is sublexical 
(but not prelexical) in that it is intermediate 
between word and letter levels. As noted by 
Grainger and his colleagues (1991), according to a 
sublexical account, one ought expect to observe 
inhibition among morphologically-related words 
because of their shared orthographic structure but 
this outcome has not been reported. Alternatively, 
morphological units may be represented at a level 
above the word so that all words formed from the 
same base morpheme are linked by faciHtatory 
connections to the morpheme and conversely, from 
the morpheme back to related words. By the 
supralexical account, activation spreads from a 
specific word to its base morpheme and then on to 
other words that are morphologically-related to it. 
An extension of the supralexical account is 



consistent with the claim that facilitation in the 
repetition priming task with Hebrew materials 
may reflect the process of extracting the root, from 
the root plus word pattern combination that 
constitutes a word (Bentin ft; Feldman, 1990). It 
also alleviates the problem of identifying a 
morpheme which, in Hebrew, is neither a 
phonological nor an orthographic entity. 
Segmenting root from word pattern in Hebrew 
necessarily requires extensive lexical knowledge, 
therefore the process of root extraction in Hebrew 
must be distinguished from prelexical processes 
such as affix stripping (Taft & Forster, 1975). 
Almost all Hebrew pseudowords have legal 
orthographic (and phonological) patterns so that 
their differentiation from words must entail 
examination of the root and may even include an 
evaluation of its semantic content. This 
identification may require extracting the root from 
the word. It is plausible that when roots are 
repeated over prime and target words in 
repetition priming, it is the identification of the 
root that is facilitated. Of course, even the 
extraction of a semantically meaningful root from 
its ^ord context is not sufficient to reliably 
cat^rize a string as a word. The combination of 
root morpheme and word pattern must also be 
evaluated. It was observed in Experiment 3 that 
pseudowords composed of a meaningful root in 
illegal combination with a word pattern were more 
difBcuIt to reject than pseudowords formed around 
a meaningless root Activation from the root could 
spread down to letter level even in the absence of 
word level activation and this pattern of activation 
throughout the system could have the effect of 
biasing the decision process toward a word 
response. 

In summary, both lexical and postlexical 
influences may contribute to the pattern of 
facilitation in the repetition priming task. For 
lexical decision, response repetition about the 
lexical status of a particular morpheme in a 
particular (wo/d or pseudoword) context 
constitutes a postlexical contribution. Support for 
the lexical aspect of morphological analysis is tied 
to the pattern of facilitation in the repetition 
priming task for word targets. It could arise either 
sublexically or supralexically. The noncon- 
catenative morphological structure of Hebrew 
lends itself to a supralexical representation of 
morphology. If common morphological units are 
captured at a level above the word then 
discontinuities of phonological or orthographic 
components of a morpheme are no longer 
problematic. Prolonged latencies for pseudoword 
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composed of illegal combinations of root and word 
pattern relative to pseudowords composed from 
nonroot are alio anticipated. In sum, morpho- 
logical analysis in word recognition is not tied to 
oithographic form and entails lexical knowledge at 
either a sublexical or a supralexical level. 

REFERENCES 

Aronoff, M. (1976). Wori formation in generative grammar, 

Cambridge, MA: MIT Piw. 
B«nlin, S., h Feldman, L. B. (1990). The contribution of 

morphological and leuuntic rdatedncK to repetition priming 

at short and long lag^ Evidence from Habrefv. QuerteriyfountaJ 

(fExperimenUl PtyOtohgy, 42A, 693-711. 
Bentin, S., ic Mo6COVitcK M. (1988). The time course of repetition 

effects for words and ui\funiliar laces. Joumel cf ExperimenUl 

Psychoiogy: Cenertl, 117, 148-160. 
Bentin, S., ic Pded, B-S, (1990) The contribution of task-rcUted 

factors to ERP repetition cfiects at short ml long lags. Aicm^ 

and Cognition, 18, 359-367. 
Berman, R. A. (1978). Modem Hebrew Structure. Tel Aviv: 

University Publishing Projects. 
Cmmazza^ A., Laudanrui, A., Ic Romania C. (1988). Lexical access 

and inflectionsl morphology. Cognition, 28, 287-331 
Chomsky, N., Ic Halle, M. (1968). The tound pattern of English, New 

York: Harper and Row. 
Dani>enbring, G. L., & Briarvi K. (1982). Semantic priming and 

the word repetition effect in a lexical decision task. Camdmn 

journal cf Psydwiogy, 36, 43S444. 
Duchek, J. M. Ic Neely, J. H. (1989). A dissociative word- 

frequeiKy x levels-of-processing interaction in episodic 

rt^^gnition and lexical decision tasks. Memory and Cognition, 

17,148-162. 

Emmorcy, K. D. (1985). Auditory morphological priming in the 

lexicon. Language and Cognitioe Proeeaes, 4, 73-92. 
Feldman, L. B. (in press). Bi-alphsbctism and the design of a 

reading mechviism. In D. M. Willows, R. Sw Knik, Ic E Corcos 

(Eds.), Visual processes in reading and reading disabilities, 

Hillsdale, N):E73baum. 
Feldman, L. B. (1992). Morphological relationships revealed 

through the repetition priming task. In M. Noonan, P. 

Downing, Ic S. lima (Eds.), Unguisties and Literacy, {pp. 239- 

254). Amsterdam/Philadelphia: John Benjamins Pub. Co. 
Feldman, L B., Ic Fowler, C. A. (1987). The inflected noun system 

in ScrlxvCroatian: Lexical representation of morphological 

structure. Memory & Cognition, 15, 1-12. 
Fddmaa L B., Ic Moskovljevid, J. (1987). Repetition priming is 

iv>t purely episodic in origin. ]oumal of Experimental Psychoiogy: 

htarning, Memory & Cognition, 13, 573-581. 
Forster, K. Ic Davis, C. (1984). Repetition priming and 

frequeiKy attenuation in lexical access. ]ouma\ of Experimental 

Psychoiogy; Lmming, Memory and Cognition, 10, 680-^. 
Fowler, C A., Napps, S. E., Ic Fddman, L. B. (1985). Relations 

among regular and irregular morphologically related words in 

the lexicon as revealed by repetition priming. Memory & 

Cb^tVion, 13, 241-255. 
Grainger, J., CcM, P., ic Segui, J. (1991). Masked morphological 

priming in visual word recognition. Memory & Language, 30, 

370-384. 

Hankamer, ). (1969). Morphological parsing and the lexicon. In W. 

Marslen- Wilson (EdL), Lexical representation and process, (pp. 392- 

408). Cambrkige, MA: MIT Press. 
Hanson, V. L., Ic Feldman, L. B. (1989). Language spedfidty in 

lexical organization: Evidence from deaf signers' lexical 



orgaiazationof ASLasviEnglisKMrirury^^O^grtitioH, 17.292- 
301. 

Hanson, V. L., Ic WilkenWd, D. (1985). Morj^phonology and 
lexical organization in deaf readers. Language and Speech, 28, 
269-280. 

Hendmoa L. (IS^). Toward a psychology of morphemes. In A. 
W. Ellis (Ed.), Progress in the psychoiogy of language (pp. 15-72). 
London: Eribaum. 

Henderson, L. (1989). On the mental reprcsenUtion of 
morphology arid its diagcrasis by measures of visual acceK 
^>eed. In W. Marslen-Wilson (Ed.), heidcal representation and 
proass (pp. 357-391). Cambridge?, MA: MIT PrcK, 

Henderson, L., Wallis, J., Ic Knight, D. (1984). Morphemic 
structure and lexical access. In H. Bouma Ic D. Bouwhuls 
(Eds.), Attention and performance X (pp. 211-224). Hillsdale, NJ: 
EribatmL 

Katz, L., Rexer, K., Ic LukateU, G. (1991). The processing of 

infiectKl words. Psychological Research, 53, 25-31. 
Kelliher, S., Ic Henderson, L. (1990). Morphologically based 

frequency effects in the recognition of irregulariy iitflected 

verbs. British Journal of Psychology, 81, 527-539. 
Kempley, S., Ic Mortoa I- (1982). The effects of priming with 

regulariy arvl irregularly related words in auditory word 

recognition. British Journal ofPtyMogy, 73, 441-454. 
Logan, G. (1988). Toward an ii«tance theory of automatization. 

PsyMogiaa Keuiew, 95, 492-527 . 
Matthews, P. H. (1974). Morphology. Cambridge: Cambridge 

Uruvcrsity 

McCarthy, J. J. (1981). A prosodic theory of noiKoncateruitive 

moc|4K)logy. Linguistic Inquiry, 12, 375-346. 
Monsdl S. QSSS). Repetition and the lexicon. In A. W. Ellis (Ed.), 

Proems in the psychology of language (pp. 147-195). London: 

Eribaum. 

Napps, & E (1989). Morphemic relationships in the lexicon: Are 

they distinct from semantic and formal relationships? Memory 

^Cognition, 17, 729-739. 
Napps, S. E, Ic Fowler, C. A. (1987). Formal rdationships among 

words aiKl the organization of the mental lexicon. JoutjmI of 

PsychoUnguistic Research, 16, 257-272. 
Oman, U. (1971) Binyanimi, Ubsisim, Netiyot, and HaUiyot. 

minroendta, 16, 15-22. (in Hebrew) 
PrinzmeUl, W., Hoffman, H., Ic Vest, K. (1991). Automatic 

processes in word recogitttion: An analysis form illusory 

conjunctions. Journal of Experimental Psychology: Human 

Perception and Performance, 17, 902-923. 
Rapp, B. C. (1992). The nature of sublexical orthographic 

organization: The bigram trough hypothesis examined. Journal 

of Memory and Langmge. 
Scarborou^, D. L., Cortese, C. Ic Scarborough, H. S. (1986). 

Frequency and repetition effects in lexical memory. Journal of 

Experimentsl Psychology: Human Perception and Performance, 4, 1- 

17. 

Schriefers, H., Friedcrid, A., Ic Graetz, P. (1992). Inflectional and 

derivational morphology in the mental lexicon: Symmetries 

and asymmetries in repetition priming. Quarterly Journal of 

Experimental Psychok)gy, 44A, 373-390. 
Seider^g, M. (1987). Sublexical structures in visual word 

recognition: Access units or orthographic redimdancy? In M. 

Coltheart (Ed.), Attention and Performance XII: The Psychoiogy of 

Peading, (pp. 245-263). Hillsdale, NJ: Eribaum. 
Seidenberg, M., Ic McCleUand, J. L (1989). A distributed, 

developmental model of visual word recognition and naming. 

Psychciog^cal Keoiew, 96. 523-568. 
Stanners, R. F., Neiser, J. J., Hemon, W. P., Ic HuU, R (1979). 

Memory representation for morphologically related words. 

Journal cf Verbal Learning and Verbal Behavior, 18, 399-41Z 



ERLC 



BEST COPY AVAIIULE m 



Ijg Fddmttn and Bentin 



St«nb«ger, J. P., it MacWhinney, B., (1986). Fr«qutnqr end 

lexical storage of KguUrly inflaclad words. Mmoty and 

Gjjm«(m,K 17-26. 
Stembege-, J. Pv ^ MacWhfamey, B., (l^). Are taflect^i forms 

8tor«i in the lexicon? M. Hanunond ic M. hkxman (Eds.) 

Tfcflwrttai ifwrpWfljjy. San Diego, CA^^ 
Taft, M., ic Forster, K. L a^TS). Lexical storage and retrieval of 

prefixed words, founua cfVffM Lmming mtd VtM Mmvior, 

14,638^7. 



FOOTNOTES 

tAlso State Uiivcnity of New York at Albany. 
ttDtprtnent of P*yd»logy, The Hebrew University, Jenisalcni. 
IWe thai4c Lan Katz for devdoping the softwaxe. 
2B«»Me the cfiact of cortttnuity waa not aignlfic»t over itcnis^ F 

vahies over subjects «vcre not indudad. 
^Exceptions include alternations of strong vowds in pairs such 

m SING-SUNG and MEET-MET. 



130 



HoMkins LaboraiorUB Siiitu9 Report on Speech Rewearch 
1992, SR-109U10, 179-190 



Phonetic Recoding of Print and Its Effect on the Detection 
of Concurrent Speech in Amplitude Modulated Noise* 

RamFrostt 



When an amplitude-modulated noise generated from a spoken word is presented 
simultaneously with the word's printed version, the noise soimds more speechlike. This 
auditory illusion obtained by Frost, Repp, and Kata; (1988) suggests that subjects detect 
correspondences between speech amplitude envelopes and printed stimuli. The present study 
investigated whether the speech envelope is assembled from the printed word or whedier it is 
lexically addressed. In two experiments subjects were presented wi^ speech-plus-noise and 
wi^ noise-only trials, and were required to detect the speech in ^e noise. The auditory stimtili 
were accompanied with matching or nonmatching Hebrew print, which was unvoweled in 
Experiment 1 and voweled in Experiment 2. The stimuli of both experiments consisted of high- 
frequency words, low-frequency words, and nonwords. The results demonstrated that 
matching print caused a strong bias to detect speedi in tiie noise when the stimuli were either 
high- or low-frequency words, whereas no bias was found for nonwords. The bias effect for 
words or nonwords was not affected by spelling to sound regularity- tiiat is, similar effects 
were obtained in the voweled and the tmvoweled conditions. These results suggest that the 
amplitude envelope of the word is not assembled from the print. Rather, it is addressed 
directly from the printed word and retrieved from the mental lexicon. Since amplitude 
envelopes are contingent on detailed phonetic structures, this outcome suggests that 
representations of words in the mental lexicon are not only p^wnologic but also f^onetic in 
character. 



It is generally assumed that the processing of 
words in the visual and auditory modalities differs 
in the initial phase because of different input 
characteristics, but converges at later stages. 
Hence, findings regarding the influence of 
orthographic information on the perception of 
speech, and findings showing how spoken 
information visual word perception, may suggest 
how print and speech are integrated in the mental 
lexicon. 
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The present study is concerned with a special form 
of interaction between the visual and auditory 
modalities during word recognition. It discusses 
the possible origins of an illusion of hearing 
speech in noise caused by simultaneous 
presentation of printed information. 

The convergence of printed and spoken stimuli 
representations during processing has been 
previously demonstrated in unimodal studies. It 
has been shown that lexical decisions to spoken 
words are facilitated if successive words share the 
same spelling (Jakimik, Cole, & Rudnicky, 1980). 
Similarly, Hillinger (1980) has shown that 
priming effects with printed words were enhanced 
when primes and targets were phonemically 
similar. However, the influence of one modality on 
processing in the other modality be shown 
more directly in cross-modal studies. It has been 
established that printed words can prime lexical 
decisions to spoken words and vice versa (Hanson, 
1981; Kirsner, Milech, & Standen, 1983). 
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Similarly, using the naming task, Tanenhaus, 
Flanigan, and Seidenberg (1980) have 
demonstrated a visual-auditory interference in a 
Stroop paradigm. These results were interpreted 
to show that reading and listening share one 
lexicon, which allows identical messages to be 
understood in the two modalities in the same way. 

Stronger but more controversial evidence 
concerning the interaction of the visual and the 
auditory modalities comes from studies demon- 
strating cross-modal influence occurring before 
the completion of input analysis. According to a 
strongly interactive view, some or all stages of the 
perceptual process in one modality may be 
influenced by activation in the other modality. For 
example, it has been suggested that automatic 
grapheme-to-phoneme activation might occur 
prior to word recognition, hereby affecting the 
process of auditory lexical access through sub- 
lexical activation in the visual modality (e.g.. 
Frost & Katz, 1S89; Dijkstra, Schreuder, & 
Frauenfelder, 1989). Dykstra et al. (1989) have 
shown that a visual letter prime can facilitate the 
auditory detection of a vowel in a syllable. 
Similarly, Layer, Pastore, and Rettberg (1990) 
have reported results showing faster identification 
of an initial auditory phoneme when congruent 
visual information was presented simiiltaneously. 

Perceptual cross-modal influences can be shown 
at levels higher than graphemes and phonemes. 
In a recent study. Frost et al. (1988) have reported 
an auditory illusion occurring when printed words 
and mask^ spoken words appear simiiltaneously. 
Subjects were presented with speech-plus-noise 
and with noise-only trials, and were required to 
detect the masked speech in a signal detection 
paradigm. The auditory stimuli were accompanied 
by print which either matched or did not match 
the masked speech. Since the noise used in this 
experiment was amplitude modulated, (i.e., the 
spoken word was masked by noise with the same 
amplitude envelope), when a printed word 
matched the spoken word, it also matched the 
amplitude envelope of the noise generated from it. 
Frost et al. (1988) have shown that, whether 
speech was indeed present in the noise or not, 
subjects had the illusion of hearing it in the noise 
when the printed stimuli matched the auditory 
input. These results demonstrate that subject 
automatically detected a correspondence between 
noise amplitude envelopes and printed stimuli 
when they matched. The detection of this 
correspondence made the amplitude-modulated 
noise sound more speechlike, causing a strong 



response bias. This effect was extremely reliable 
and appeared for every subject tested. The bias 
effect did not appear when the printed words and 
the spoken words from which the amplitude 
envelopes were generated were merely similar in 
their syllabic stress pattern, or phonologic 
structure. These results suggest that the printed 
words were receded into a very detailed, 
speechlike, phonetic representation that matched 
the auditory information, thereby causing the 
illusion. 

One important finding reported by Frost et al. 
(1988) relates to the processing of nonwords. 
When the printed and spoken stimuli were pseu- 
dowords (nonwords which were phonotactically 
regular), the bias to hear speech in the noise in 
the matching condition was much smaller. This 
result is of special interest because subjects could 
not identify the masked spoken stimuli, and 
therefore were unaware that they consisted of 
nonwords. Nevertheless, they could not detect a 
correspondence between a printed letter string 
and its amplitude envelope if it was not a legal 
word. One possible interpretation of this outcome 
is that in contrast to words, the covert 
pronunciation of nonwords is generated either 
pre**lexically from the print, or indirectly by 
accessing similar words in the lexicon. 
Apparently, either process is too slow or too 
tentative to enable subjects to match the resulting 
internal phonetic representation to a 
simultaneous auditory stimulus before that 
stimulus is fully processed. 

However, a more radical interpretation of the 
words-nonwords differences can be suggested. It is 
possible that amplitude envelopes are stored as 
holistic patterns in the lexicon, and are addressed 
automatically by printed words. According to this 
interpretation, the bias effect could not have been 
obtained for nonwords, because nonwords are not 
represented in the mental lexicon, and their 
printed forms could not have addressed any stored 
amplitude envelope. It is important to explore this 
hypothesis further since it has direct relevance to 
models concerned with the representations of 
spoken words in the mental lexicon, and with 
models of visual lexical access. Models of spoken 
word recognition often assume that 
representations of words in the lexicon are 
phonologic in nature, and that the contact 
representations generated from the speech wave 
are abstract linguistic units like phonemes and 
syllables (See Frauenfelder & Tyler, 1987, for a 
review). According to the above interpretation. 
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however, representations of spoken words are 
maximally rich» consisting not only of abstract 
linguistic units, but also of detailed phonetic 
information such as spectral templates, and 
amplitude envelopes. Amplitude envelopes cannot 
be considered phonological representations be- 
cause they do not provide the explicit phonemic or 
syllabic structure of the word. Rather, they retam 
some speechlike features and convey mostly 
prosodic and stress information. A similar non- 
phonologic approach to the riental lexicon, was 
advocated by Klatt in his LAFS (Lexical Access 
From Spectra) model (Klatt, 1979; see Klatt, 1989, 
for a review; see also Gordon, 1988; Jusczyk, 
1985). 

This issue is also relevant to current discussions 
concerning the processing of printed words. 
Models of visual word perception are in 
disagreement concerning the extent of phono* 
logical receding during printed word recognition 
(e.g., Seidenberg, 1985; Van Orden, 1987). One 
class of models assumes that phonological codes 
are generated automatically following visual 
presentation and mediate lexical access (Perfetti, 
Bell, & Delaney, 1988; Van Orden, Johnston, & 
Halle, 1988; and see Van Orden, Pennington, & 
Stone, 1990 for a review). In contrast, it has been 
suggested that phonological codes are seldom 
generated during visual word recognition, and 
that with the exception of very infrequent words, 
printed words activate orthographic units that are 
directly related to meaning in semantic memory 
(e.g. Seidenberg, 1985; Seidenberg, Waters, 
Barnes, & Tanenhaus, 1984). Thus, results 
demonstrating that a visual presentation of a 
printed word produces a detailed phonetic 
representation that includes the word's amplitude 
envelope, even when the experimental task does 
not require it, provide support for automatic and 
rapid phonetic receding in silent reading. 

The aim of the present study was to examine 
further the hypothesis that amplitude envelopes 
representations of spoken words are not 
assembled pre-lexically from the print, but are 
stored holistically in the mental lexicon, and are 
addressed directly and automatically by matching 
printed words following lexical access. The 
generation of a phonetic representation from the 
print can theoretically be achieved through a pre- 
lexical process that maps representation of 
graphemes into representation of phonemes by 
applying grapheme-phoneme correspondence 
rules, and subsequently by transforming the 
abstract phonologic structure into a detailed 



representation for silent or overt reading. This 
process has been often suggested to diaracterize 
the naming of novel words or of nonwords (e.g., 
Coltheart, 1978). Note that whether the 
phonologic and phonetic structures are derived by 
applying grapheme-phoneme correspondence niles 
(Venezky, 1970), or by analogy (Glushko, 1979) is 
irrelevant in the present context, bince both 
procedures assiune that the phonologic code is 
generated prior to the selection of a lexical 
candidate (i.e., prior to lexical access). In contrast 
to this account, the hypothesis forwarded in the 
present study suggests that possible differences in 
bias between words and nonwords do not result 
from the relative speed or ease with which the 
graphemic structure is transformed pre-lexically 
into a phonetic code. Rather, they emerge because 
printed words address a maximally rich lexical 
representation which contains, among other 
things, the amplitude envelope of the spoken 
word. Nonwords, on the other hand, are not 
represented in the mental lexicon, and therefore 
cannot address their amplitude envelope. 

For this purpose, the present study employed 
the speech detection task proposed by Frost et al. 
(1988) and examined whether the bias effect 
caused by matching print depends on the speed of 
print processing and on spelling-to-sound 
regularity, or whether it has a lexical origin. 
Spelling-to-sound regularity and the speed of 
generating phonological codes were manipulated 
by using word frequency and the unique 
characteristics of the Hebrew orthography. 

In the two experiments reported here, subjects 
were presented auditorily with speech-plus-noise 
or witii noise only trials, simultaneous with a vi- 
sual presentation of printed Hebrew words. In 
Hebrew, letters represent mostly consonants, 
while vowels can optionally be superimposed on 
the consonants as diacritical marks. Like other 
Semitic languages, Hebrew is based on word 
families derived from tri-consonant roots. 
Therefore, many words share a similar or an 
identical letter configuration. If the vowel marks 
are absent, a single printed consonantal string 
usually represents several different spoken words. 
Thus, in its unvoweled form, the Hebrew 
orthography is considered a very deep or- 
thography: it does not convey to the reader the full 
phonemic structure of the printed word, and the 
reader is often faced with phonological 
ambiguity.^ In contrast, the voweled form is a 
very shallow writing system. The vowel marks 
convey the missing phonemic information making 
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the printad word phonemically unequivocal (but 
see Frost, in preM, for a discuBtion). 

Several itudief in Hebrew have established that 
the presentation of unvoweled print encourages 
the use of ortiioerapbic codes to access the lesioon. 
In order to assign a correct vowel configuration to 
the printed consonants to form a valid word, read- 
ers of Hebrew have to draw upon their lexical 
knowledge. The complete phonological structure of 
the printed word can mly be retrieved post-lexi- 
cally, after one word candidate has been accessed. 
(Bentin, Bargai, & Katz, 1984; Frost, Katz, & 
Bentin, 1987). In contrast, the explicit 
presentation of vowel marks provides the reader 
with the complete phonemic structure of the word 
(or nonword). Because the voweled orthography is 
characterized by grapheme-to-phoneme regularity, 
the diacritical marks enable the generation of a 
pre-lexical phonologic code by using simple 
spelling-to-sound conversion rules. This special 
characteristic of the Hebrew orthography was 
exploited in order to investigate whether the bias 
effect caused by matching print on speech 
detection in noise is affected by the print's 
phonologic transparency. Specifically, we ex- 
amined whether the bias effect is dependont on 
the presentation or the omission of vowel marks. 

EXPERIMENT 1 

In Experiment 1 subjects were presented with 
high- and low-frequency spoken words, as well as 
with nonwords^ which were masked by noise with 
the same amplitude envelope. In addition, the 
noises were presented alone. The subjects* task 
consisted of deciding in each trial whether speech 
was present in the noise, or whether there was 
noise only. Simultaneous with the auditory 
presentation, a printed unvoweled Hebrew letter 
string appeared on a computer screen. Sometimes 
the printed word or nonword matched the 
auditory stimulus, and sometimes it did not In 
each of these experimental conditions it was 
determined whether the print caused a bias to 
hear speech in the noise. 

The purpose of this experiment was three-fold: 
Firsts to examine whether the bias effect obtained 
in the shallower English orthography, can be 
obtained in the deeper Hebrew orthography. If the 
effect depends on the speed of generating 
amplitude envelopes pre-lexically from the print 
by using spelling-to-soimd conversion rules, then 
the unvoweled Hebrew is at a clear disadvantage. 
It does not convey explicitly to the reader the full 
phonemic information necessary for the 



construction of the amplitude envelope. Althoiigh 
the vowel information can be retrieved from the 
lexiocm following visual lexical access, this process 
is slower to develop. Indeed, a multilingual 
comparison of naming latencies (Frost et aL, 1987) 
revealed that naming in unvoweled Hebrew is 
slower than naming latencies in shallower 
orthographies like English and Serbo-Croatian. 
Moreover, in contrast to English and Serbo- 
Croatian, naming latencies in Hebrew were found 
to be slower than lexical decisions. This is because 
the phonemic structure necessary for naming is 
not conveyed directly by the print, but retrieved 
from the lexicon (Frost et aL, 1987). 

Another factor which affects the speed of 
generating phonetic codes from print is word 
frequency. Hence, the second aim of Experiment 1 
was to examine whether the bias effect, if 
obtained, depends on word frequency, or merely 
on word lexicality. If our previous differences in 
bias between words and nonwords resulted from 
the speed by which the printed words and 
nonwords were transformed into a phonetic 
structure, then one should expect a stronger affect 
of bias for high-frequency words relative to low- 
frequency words. Tliis is because it is easier to 
retrieve the phonetic structure of hi^-frequenpy 
words (as reflected by faster RTs for these words 
in the naming task). If, on the other hand, the 
origin of the bias effect is purely lexical, then a 
bias should be obtained for all words, whether 
frequent or nonfrequent, but not for nonwords. 

Finally, the third aim of the experiment was to 
examine the bias effect in a mixed design of words 
and nonwords. Note that in the original study 
reported by Frost et al. (1988) we employed a 
blocked design. One serious handicap with our 
previous blocked design was that subjects knew in 
advance whether the auditory stimuli were words 
or nonwords. This might have encouraged the 
adoption of different strategies for words and for 
nonwords, hereby causing the differences we 
obtained in the bias effect. In a mixed design such 
uniform strategy cannot be adopted. Thus, if our 
previous results were caused by this 
methodological factor, then no significant 
differences in bias between words and nonwords 
should emerge in the present mixed design, and 
nonwords would show tiie effect as well. 

Methods 

Subjects. Twenty-four undergraduate students, 
all native speakers of Hebrew, participated in the 
experiment for course credit or for payment 
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Stimulus preparation. The stimuli were gener- 
ated from 24 disyllabic words and 12 disyllabic 
nonwords that had a stop consonant as their ini- 
tial phoneme. The number of phonemes for all 
stimuli was either four or five. The 24 words con- 
sisted of 12 high-frequency words and 12 low-fre- 
quency words. Because there are no reliable 
sources of standard objective word frequency 
counts in Hebrew, subjective ft^uencies were as- 
sessed by averaging the ratings of 50 subjects on a 
1 Geast frequent) to 7 (most frequent) scale. The 
mean ratings of the high- and the low-frequenor 
words were 5.3 and 3.1, respectivx^iy. Tbo 24 words 
were all unambiguous in unvowr ied print- that is, 
their orthographic form represented only one lexi- 
cal entry. Thus, each letter string could be read as 
a meaningful word in only one way, by assigning 
to the consonant one specific vowel configuration. 
The nonwords were, in fact, pseudowords, that 
were constructed by altering one or two phonemes 
of real words. All nonwords conformed to the 
phonotactic rules of the H^rew language. 

The auditory stimuli were originally spoken by a 
male native speaker in an acoustically shielded 
booth and recorded on an Otari MX5050 tape- 
recorder. The speech was digitized at a 20 kHz 
sampling rate. From each digitized word, a noise 
stimulus with the same amplitude envelope was 
created by randomly reversing the polarity of 
individual samples with a probability of 0.5 
(Schroeder, 1968). This signal-correlated noise 
retains a certain speechlike quality, even though 
its spectrum is flat and it cannot be identified as a 
particular utterance unless the choices are very 
limited (see Van Tasell, Soli, Kirby, & Widin, 
1987). The speech-plus-noise stimuli were created 
by adding the waveform of each digitized word to 
that of the matched noise, adjusting their relative 
intensity to yield a signal-to-noise ratio of -10.7 
dB. 

Each digitized stimulus was edited using a 
waveform editor. The stimulus onset was 
determined visually on an oscilloscope and was 
verified auditorily through headphones. A mark 
tone was then inserted at the onset of each 
stimulus, on a second track that was inaudible to 
the subjects. The digitized edited stimuli were 
recorded ai three-second intervals on a two-track 
audiotape, one track containing the spoken words 
while the other track contained the mark tones. 
The purpose of the mark tone was to trigger the 
presentation of the printed stimuli on a Macintosh 
computer screen. 

Design, Each of the high-frequency words, low- 
frequency words, and nonwords was presented in 



two aiiditoiy forms (1) Speech-plus-noise trials, in 
which the spoken stimulus was presented masked 
by noise. (2) Noise-only trials, in which the noise 
was presented by itself without the speedi. Each 
of these auditory presentations was accompanied 
by two possible visual presentations: (1) a 
matching condition (i.e. the same word or nonword 
that was presented auditorily and/or that was 
used to generate the amplitude-modulated noise, 
was presented in print); (2) a nonmatching 
condition (i.e., a different word or nonword, having 
the same number of phonemes and a similar 
phonologic structure as the word or nonword 
presented auditorily, or that was used to generate 
the noise, was presented in print). Thus, there 
were four combinations of visual/auditory 
presentations for each word or nonword, making a 
total of 144 trials in the experiment. 

Procedure and apparatus. Subjects were seated 
in front of a Macintosh SE computer screen (9* 
diagonal^ screen size), and listened binaurally over 
Sennheiser headphones at a comfortable intensity* 
The subjects sat approximately 70 cm from the 
screen, so that the stimuli subtended a horizontal 
visual angle of 4 degrees on the average. A bold 
Hebrew font, size 24, was used. The task consisted 
of pressing a ^es* key if speech was detected in 
the noise, and a ^no** key if it was not. The 
dominant hand was always used for the ^es* 
responses. Although the task was introduced as 
purely auditory, the subjects were requested to 
attend carefully to the screen as well. They were 
told in the instructions that, when a word or a 
nonword was presented on the screen, it was 
sometimes similar to the speech or noise 
presented auditorily, and sometimes not. 
However, they were informed about the equal 
proportions of ''yes" and ^no" trials in each of the 
different visual conditions. 

The tape containing the auditory stimuli was 
placed on a two-channel Otari MX5050 tape- 
recorder. The verbal stimuli were transmitted to 
the subject's headphones through one channel, 
and the trigger tones were transmitted through 
the other channel to an interface that directly 
coimected to the Macintosh, where they triggered 
the visual presentation. 

The experimental session began with 24 practice 
trials, after which the 144 experimental trials 
were presented in one block. 

Results and Discussion 

The indices of bias in the different experimental 
conditions were computed following the procedure 
suggested by Luce (1963). Results computed 
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according to Luce's procedure tend to be very 
nnular to results produced by the standard signal 
detection computations (e.g., Wood, 1976). 
However, Lace's indices do not require any 
assumptions about the shapes of the underlying 
signal and noise distributions, and are easier to 
compute relative to the standard meas'jres of 
signal detection theory. The Luce indices of bias 
and sensitivity originally named- Inb and lnn> but 
renamed here for convenience b and d are: 



and 



6 = 1/2 In [p(ye8/s+n) pCyes/a) / 
pCno/s-^n) p(no/n) ], 



d = 1/2 In [p(ye8/s+n) p(no/n) / 
p(yes/n) p{no/8+n) ], 



where s+n and n stand for speech-plus-noise and 
noise only, respectively. The indice b assumes 
positive vflilues for a tendency to say ^es" and 
negative values for a tendency to say *no For 
example, according to the above formula, in order 
to obtain an average b of +0.5, the subject must 
generate on the average 60 percent more positive 
responses than negative ones. The indice d as- 
sumes values in the same general range as the d' 
of signal detection theory, with zero representing 
chance performance. 

The average values for the bias indices in each 
experimental condition are shown in Table 1 (top). 
There was a bias to say V^s" in the matching 
condition for high-frequen^ and for low-frequenpy 
words, whereas there was no bias in the 
nonmatching condition. The bias effect found for 
high-frequen^ words was not stronger than that 
for low-frequency words. In fact the opposite 
pattern was obtained. In contrast to the high- and 
the low-frequency words, there was no bias to say 
''yes" for nonwords in the matching condition. 
There was, however, a bias to say "no" in tiie 
nonmatching condition. 

The bias indices were subjected to a two-way 
analysis of variance with the factors of word type 
(high-frequency words, low-frequency words, and 
nonwords) and visual condition (matching print, 
nonmatching print). The main effects of word type 
and visual condition were significant (F(2,46=:17.3, 
MSe = 0.48, p < 0.001, andF(l,23)=22.0, MSe= 
0.64, p<0.001, respectively). The two-way 
interaction was also significant (P(2,46)s6.5, 
MSe=0.19, p<0.003). A Tukey post-hoc analysis 
revealed that the differences in bias between 
either type of words and between the nonwords 
were reliable, as well as the difference between 



the hi|^- and the low-frequency words (p<0.05). 
The apparent greater bias to say ''no" in the 
nonmatching condition relatively to the 
nonmatching condition for the nonwords was not 
significant. 

Table 1. Bias indices (b), and (Standard Error of the 
Means) for high-frequency words, low-frequency words 
and nonwords, when matching and nonmatching print 
is presented simultaneously with masked speech. Print 
is presented unvoweled. The top b indices were 
a^ragedfor all subjects, whereas the bottom b indices 
were averaged for the 12 subjects with the highest 
detectabUity scores (d). 

Hifh-FraqotAcy Low-FrvqiMOcy Noewords 





Words 


Words 




Match 


0^5 


0.94 


-0.19 




(0.14) 


(0.18) 


(0.14) 


No Match 


-0.11 


0.02 


-0.48 




(0.12) 


(0.14) 


(0.13) 


Average if: 


rO.15 (D«24) 






Match 


0.57 


0.99 


-0.20 




(0.20) 


(0.19) 


(0.15) 


No Match 


-035 


-0.15 


-0.67 




(0.17) 


(0.13) 


(0.17) 


Average d 


«0.32 (n=12) 







The average d in the experiment was 0.15. 
Hence, the signal-to-noise ratio which was 
employed in the experiment resulted in a very low 
level of detection.^ In order to ensure that the 
obtained pattern of bias was not affected by the 
low detection level, the subject sample was split in 
half, and the average b indices were recomputed 
for thoee subjects with highest d. The average b 
for this sample in the different experimental 
conditions are presented in Table 1 (bottom), and 
confirm that the bias was unaffected by the level 
of detection. This outcome is in accordance with 
results presented by Frost and his colleagues 
showing significant bias effects over a wide range 
of signkl-to*noise ratios.^ 

The data of Experiment 1 thus reveal that the 
bias in the visual matching condition was ob- 
tained even in the unvoweled Hebrew orthogra- 
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phy. Howevex% similar to our previous study, this 
effect can be demonstrated only when legal words 
are presented in the visual modalify. The bias ef- 
fect was not reduced when the printed words were 
relatively infrequent This suggests that the speed 
by which the phonetic structure of the word is re- 
trieved does not affect the illusion of hearing 
speach in the noise in the matching condition. ^ 
The unexpected stronger effect of bias obtained for 
the low-frequency words may be possibly related 
to the phonetic features of the words employed. 
This possibility will be further considered in the 
General Discussion. 

The most sigm " cant outcome of the experiment 
is that there was no bias in the matching condi- 
tion for nonwords. Since in the present study a 
mixed design was employed, this effect cannot be 
attributed to a uniform ''set' strategy adopted for 
the nonwords. Although there was a greater 
tendency to say ^o' when nonwords appeared in 
the nonmatching condition relative to the 
matching condition, this tendency was not found 
to be statist:cally reliable. These results suggest 
then, that in contrast to words, the presentation of 
printed nonwords did not easily invoke a phonetic 
representation which could be compared to the 
amplitude envelopes presented auditorily. Hence, 
the outcome of Experiment 1 lends support to the 
hypothesis that the bias to say ^yes* in the 
matching condition for words only, regardless of 
their frequency, results from the automatic 
retrieval of their amplitude envelopes from the 
lexicon. This process does not appear to be 
affected by factors related to the speed of 
generating a phonetic code. 

EXPERIMENT 2 
One possible criticism of the results of 
Experiment 1 is that in the imvoweled Hebrew 
orthography the phonemic structure of printed 
words can be retrieved from the mental lexicon 
following visual access. In contrast, the phonemic 
structure of nonwords cannot be determined 
unequivocally, since the printed consonants do not 
specify how exactly a nonword should be read. It 
might be argued that this caused the different 
pattern of bias found for words relatively to 
nonwcrds. According to this interpretation, the 
amplitude envelopes of words were not stored as 
sudi in the lexicon, but generated on-line from 
more abstract phonologic or phonetic structures 
which were retrieved post-lexically for the words. 
Because nonv/ords are not represented in the 
lexicon, and because the complete phonetic 



structure of the nonwords was not specified by the 
unvoweled print, no bias was obtained for 
nonwords. 

In order to ascertain that this factor did not 
affect our previoiis findings, in Experiment 2 the 
effect of bias was measured when the printed 
stimuli were voweled. By adding the diacritical 
vowels marks to the consonants, the Hebrew 
orthography is as shallow as other orthographies 
which have a clear and unequivocal mapping of 
spelling-to-sound (e.g., Serbo-Croatian). The 
marks convey the full phonemic inforL\ ation that 
is necessaiy to produce a pre-lexical phonologic 
code for hoik words and nonwords. Therefore, the 
explicit presentation of vowels eliminates the 
superiority of words over nonwords in regard to 
phonologic and phonetic processing: Phonologic 
receding of both words and nonwords can be easily 
and unequivocally occur through a fast pre*le;dcal 
process by applying grapheme-to-phoneme corre- 
spondence rules, and a phonetic representation 
that includes the word's amplitude envelope may 
be generated subsequently from the pre-lexical 
phonologic representation. If, indeed, an ampli- 
tude envelope can be formed on-line from such 
pre-lexical representations, then the addition of 
vowel marks should produce a bias effect for non- 
words as well as for words in the matching 
condition. 

Method 

Subjects, Twenty-four undergraduate students, 
all native speakers of Hebrew, participated in the 
experiment for course credit or for payment None 
of the subjects participated in Experiment 1. 

The design, procedure, and apparatus were 
identical to Experiment 1, except that the printed 
words and nonwords were voweled by adding their 
diacritical marks. 

Results and Discussion 

The bias indices are presented in Table 2. As in 
Ebcperiment 1, there was a bias to say ''yes'* in the 
matching condition for high- and for low-frequen(^ 
words, but no positive bias whatsoever for non- 
words. There was no positive bias in the non- 
matching condition for words. However, similar to 
the pattern obtained in Experiment 1, there was a 
bias to say ^o" in the nonmatching condition for 
nonwords. The b indices were subjected to a two- 
way ANOVA with the factors of word type and vi- 
sual presentation. The main effects of word type 
and visual presentation were significant 
6P(2,46)=10.8, MSe=0.5, p<0.001, F(l,23)=20.2, 
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MS6&0.99, p<0.001» respectively). Hie interaction 
of word type and visual presentation was 
significant (F(2,46)=3.30, MSe^rO.lS, p<0.04). A 
Tukey post-hoc analysis revealed that the 
differences in bias between either type of words 
and between the nonwords were significant 
(p<0.05X The greater bias for a "^o*" response 
found for nonwords, in the nonmatching condition 
relatively the nonmatching condition, was 
significant as welL Ite difference in bias between 
the high- and the low-frequency words was not 
statistically reliable. 

Table 2. Bias indices and (Standard Error of the 
Means) for high-frequency words, low-frequency words 
and nonwords, when matching and nonmatching print 
is presented simultaneously with masked speech. Print 
is presented iHiweUd. 

High-Frequcncy Low-Fpeqpcncy Nonwords 



Match 


0.67 


0.84 


0.00 




(0.16) 


(0.18) 


(0.15) 


No Match 


-0.16 


-0.0 


-030 




(0.15) 


(0.15) 


(0.13) 



Average d - 0.28 (n-24) 



The results of Experiment 2 suggest that the 
addition of vowel marks did not produce an effect 
of bias to say '^es'* in the matching condition for 
nonwords. It could be pointed out that there was a 
significant greater tendency to say '^o* when 
nonwords appeared in the nonmatching condition 
relative to the matching condition. However, even 
if the absolute relative difference between the two 
visual conditions serves as a measure for the 
effect, this difference was almost twice as large for 
words than for nonwords, as revealed by the 
significant two-way interaction. Thus, although 
the vowel marks conveyed an unequivocal 
phonemic structure for the printed nonwords, and 
allowed the generation of a phonological 
representation for both words and nonwords, the 
difference in bias between words and nonwords 
remained unchanged. This suggests that the 
phonetic representation that includes the 
amplitude envelop^ information was available 
only for words to influence the subjects' judgment. 
The overall similarity in the effects of bias in 
Experiments 1 and 2 is striking. This outcome 



confirms that the bias is independent of the print 
spelling-to-sound regularity, and provides 
additional support for the daim that the effect is 
lexically mediated. 

General Discussion 

The present study investigated the source of 
readers' ability to detect a correspondence between 
a printed word and its amplitude envelope. 
E:q>eriment 1 revealed that matxthing print caused 
a bias to detect speedi in a noise amplitude enve- 
lope, even in the unvoweled Hebrew orthography. 
This effect of bias was demonstrated only for 
words, whether high- or low-frequency, and not for 
nonwords. In Experiment 2 we found an identical 
pattern of bias when the printed words were vow- 
eled, and therefore were phonologically unequivo* 
cal. All voweled words produced the effect, but not 
the voweled nonwords. Moreover, the overall dif- 
ference between the matching and the nonmatch- 
ing conditions was much larger for words than for 
nonwords. 

The bias to perceive speech embedded in ampli- 
tude-modulated noise derives from an automatic 
detection of correspondence between the printed 
letter string and the speech envelope related to it. 
The present study was concerned with how ex- 
actly is this correspondence detected. In order to 
matdi the visual to the auditory information, sub- 
jects had to generate from the print the relevant 
amplitude envelope. We examined whether this 
can be done by simply applying spelling-to-sound 
conversion rules to assemble a phonologic repre- 
sentation, and by generating the envelope on-line 
from a phonetic structure that is contingent on the 
phonologic representation derived from the print. 
The results of both experiments suggest that it 
cannot. Subjects did not show any bias to detect 
speech in the noise in the matching condition 
when nonwords were presented. Note that Frost 
et al. (1988) did found a small effect of bias for 
nonword in the matching coildition. This small 
effect for nonwords is reflected in the present 
study by the greater tendency to say '^o* in the 
nonmatching condition relative to the matching 
condition. This tendency might be related to the 
overall lower detectability level obtained in the 
present study. In any event, the absolute differ- 
ence in bias in the matching relative to the non- 
matching condition was much larger for words 
than for nonwords. 

Although a phonetic representation could have 
been easily generated from the printed nonwords 
when they were voweled, the difference in bias 
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between words and nonwords remained un- 
changed. This outcome suggests that the bias 
effect is independent of spelling-to-sound regu- 
larity. Moreover, the effect seemed unaffected by 
the speed of print processing. Since the phonetic 
representation of low-frequen^ words is slower to 
generate from the print, the strong bias effect 
found for low-frequency words relative to high- 
frequen^ words suggests that speed of print 
processing is not a crucial determinant of 
the effect. Note that the addition of vowels in 
Hebrew was previously shown to accelerate the 
phonologic processing of low-frequency words 
more than for high-frequency words (Koriat, 
1985). Nevertheless, the bias effect found for low- 
frequency words did not increase in the voweled 
condition relatively to the imvoweled condition. 

The stronger bias obtained for the low-frequency 
words might be related to the phonetic features of 
the stimuli employed. The magnitude of the bias 
effect depends among other things on the 
distinctiveness (or uniqueness) of the amplitude 
envelope, that affects the clarity of correspcndence 
between the amplitude envelopes presented 
auditorily, and the word depicted by the print It 
is possible that for some low-frequency words this 
correspondence was exceptionally clear. Recent 
results by Frost (submitted) support the 
conclusion that the bias effect is not affected by 
word frequency per se. In this study the bias for 
high- and low-frequency phonological alternatives 
of heterophonic homographs was examined, with 
an identical signal-to-noise ratio. The results 
demonstrated very similar bias effects for the 
high- and the low-frequency phonological 
alternatives (0.55 and 0.51, respectively). 

Taken together, the results of Experiment 1 and 
2 suggest that the effect of bias reported in the 
present and in previous studies is lexically 
mediated. We assume that the printed word 
addressed a lexical entry which contained, among 
other phonologic and phonetic information, the 
word's amplitude envelope. Thus, the envelope 
was retrieved from the mental lexicon. By this 
view, a strong effect of bias can be shown only if 
the printed letter string can be related to an 
existing lexical entry. Nonwords do not satisfy this 
requirement, and therefore did not produce the 
effect to the same extent 

The conclusion that envelopes are stored as lexi- 
cal representations is supported by a recent study 
that examined the influence of lipreading on de- 
tection of speech in noise (Repp, Frost, & Zsiga, 
1991). This study examined the effect of a 



visual presentation of a speaker's face on the 
detection of words and nonwords in amplitude 
modulated noise. The results demonstrated that 
an audio-visual match created a strong bias to 
respond ^yes* when the stimuli were words, 
whereas no bias emerged when the stimuli were 
nonwords. In contrast to orthographic 
information, there is a natural isomorphism 
between some visible articulatory movements and 
some acoustic properties of speech. Thus, the 
relations of articulatory movements to 
phonological and phonetic structure are 
nonarbitrary, and the correspondence between 
articnilatOTy information and amplitude envelopes 
may be perceived without lexical mediation. 
Nevertheless, subjects did not produce any bias in 
the matchipg condition for nonwords. 

The proposal that amplitude envelopes are 
contained as holistic acoustic patterns in the 
mental lexicon is consistent with a view that 
lexical representations of spoken or printed words 
are not exclusively phonologic. Models of speech 
perception often assume that the speech 
processing system transforms the physical 
acoustic pattern into a more abstract linguistic 
representation which makes contact with the 
lexicon during word re<»)gniti(m. Regardless of the 
nature of this representation (i.e., what specific 
iinit serves for activating & lexical candidate), 
lexical access is often viewed as a process which 
mediates access to more abstract linguistic 
information (e.g., Mehler, 1981; Pisoni & Luce, 
1987). Our present results seem to suggest that 
the information contained in the lexicon is richer. 
In the present study, the presentation of a printed 
word resulted in the retrieval of an acoustic 
template- the word's amplitude envelope- from the 
lexicon. 

At first glance, storing the word's amplitude 
envelope as a holistic pattern might seem to be 
without apparent benefits. The envelopes cannot 
identify a specific lexical candidate. However, they 
do convey prosodic and segmental information 
(e.g., speech timing, number of syllables, relative 
stress, and several major classes of consonant 
manner), that might help in selecting a lexical 
candidate among a highly constrained set of 
response alternatives (Van Tasell et al., 1987). 
Thus, the amplitude envelope might serve as 
additional information used by the listener in 
order to identify spoken words which have several 
acoustic realizations, or which their phonemic 
structure was not clearly conveyed (cf. Gordon, 
1988). In these cases, a match between the 
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perceived amplitude envelope and the stored 
template might confirm the identity of a lexical 
candidate. Clearly, richer representations do not 
constitute a parsimonious stor^T^ system. 
Nevertheless, the advantage of a more complex 
representational system is that it often allows a 
more efficient performance of the native 
speaker/listener. 

One possible role of amplitude envelopes can be 
suggested in regard to the psychological 
distinction between words and nonwords. It is 
often assumed that positive lexical decisions given 
to a letter string or to a spoken phonemic se* 
quence are based on their relation to a semantic 
representation, whereas negative decisions result 
from the lack of sudb connections to the semantic 
network. In other words^ positive and negative 
decisions are related to the meaningfulness of the 
presented stimuli. The results of the present study 
sxiggest possibly a different type of criterion. If 
words address stored amplitude e'aveiopes and 
nonwords do not, fast lexical dedsiions might be 
based, at least in part, on whether the printed 
letter string invoked a detailed phonetic 
representation such as the amplitude envelope. 
According to this interpretation, one factor that 
differentiates between words and nonwords, and 
contributes to the word/nonword differences in the 
lexical decision task, is the generation of a 
phonetic code that contains envelope information. 
This suggestion, however, remains speculative 
and deserves further investigation. 

The present study has additional relevance to 
old and recent debates concerning the processing 
of printed words. Models of printed word recogni- 
tion are in disagreement concerning the extent of 
phonological recoding during visual word recogni- 
tion. One important controversy relates to the au- 
tomaticity of phonologic recoding. It is often as- 
sumed that phonologic recoding is very slow to de- 
velop, and lexical access occurs (with the possible 
exception of very infrequent words) directly from 
the visual structure of Uie printed words to mean- 
ing. This view is supported by results demonstrat- 
ing that spelling-to-soimd regularity affects lexical 
decisions only for low-frequency words (e.g., 
Seidenberg et al., 1984). In contrast, several 
studies have suefgested that phonologic informa- 
tion is available very rapidly as part of visual ac- 
cess to the lexicon (Perfet/d et al., 1988; Van 
Orden et al. 1988). Perfetti and his colleagues 
have shown that the effect of a pseudoword mask 
on the perception of a target word was reduced if 
there was a phonemic similarity between mask 
and target (i.e. *Wde,* "Wyd"). Using a different 



experimental technique, Van Orden (1987); Van 
Orden et al., (1988) showed that when subjects 
had to decide whether a visually presented word 
belonged to a semantic category, they often made 
errors to homophones or pseudohomophones of 
category instances (i.e., pointive responses were 
given to a *Vow8* in the category of flowers, or 
""sute" in the category of clothing). These results 
were taken to demonstrate that phonologic recod- 
ing occurs automatically and pre-lexically during 
lexical access. 

The results oi the present study support the 
view that phonetic recoding occurs automatically 
following the presentation of a printed word. What 
our results teach us is that ^e processing of a 
printed word results not only in a pre-lexical 
phonologic representation but also in a very de- 
tailed phonetic speech representation, that is lexi- 
cal, and includes the word's amplitude envelope. 
This representation is automatically retrieved 
from the lexicon. Note that in this and previoiu 
studies which iised the speech detection technique, 
subjects were not required to respond to the 
printed information. Nevertheless, they detected 
automatically the correspondence between the vi- 
sual stimulus and the speeds envelope. 

In summary, the present study suggests that 
the presentation of words in the visual or the au- 
ditory modalities results in the generation of a 
rich array of orthographic, phonologic, and 
phonetic representations. One of the^e rep- 
resentation is the word's amplitude envelope. 
Because each of these representations may contact 
the mental lexicon, auditory illusions c^n be 
caused by visxial printed information. The bias to 
detect speech in noise is caused by matching print 
because printed information arouses very detailed 
speech codes. 
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FOOTNOTES 

^Cognition, in press. 

'^Department of Psychology, The H^rew University, Jerusalem, 
Israel. 

^ A demonstration of this form of ambiguity may be portrayed in 
English by the fc^wing example: The consonantal string 1)ttr'' 
may be read as •better," "butter," "bitter," or "batter," which are 
meaningful words. In addition, many other vowel 
configurations could be added to the consonants to form 
nonwords. The Hebrew reader is faced with this form of 
phonological ambiguity regularly in the unvoweled 
orthography. The addition of the diacritical marks specifies 
tmiqu^y one phonological alternative. 

^The discriminability indices obtained in the present experiments 
were lower than those obtained by Frost et al. (1988), with a 
comparable signal-to-noise ratio. This difference may possibly be 
attributed to differences in the spoken stimuli from which the 
envelopes were generated. In the pre «nt study the spoken 
stimuli were recorded by a male speaker, whereas Frost et al. 
(1988) employed stimuli recorded by a female speaker. The 
detection of speech in amplitude modulated noise is achieved by 
perceiving local spectral peaks that rise above the flat spectral 
level represented by the masking noise. Such peaks are more 
salient with a female speaker because of the higher frequeiKies 
^hat are characteristic to female voices. 

^ Althou^ Frost et aL (1988) showed significant bias effects over a 
wide range of signal-to-noise ratios, they found reduced bias 
indices at tlte lowest ratios. Hence, it is possible that the bias 
values obtaiiwd in the present study were lower than those 
obtained by Frost et al. (1988), l>ecause of the lower levd of 
detection in the present experiment. Note, however, that the 
interpretation of the results is uruiffected by the overall bias 
values, since it is concerned with the differences in bias t)etwecn 
the matching arxl the nonmatdting conditions. 

^Throughout this paper the assumption is made that the 
processing of printed low-frequency words is slower than the 
pcocesttng of printed high-frequency words. This assumption is 
supported by the well documented frequency effect in visual 
word recognition^ but was not directly examined in the present 
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ttudy (•i*jed»w«f«iiot i«^iii^toconv«^ printed information. Note, however, that the tet of stamuU 

jinytiinea»«trmt»).DediionUtencie«intt«^)eechd employed in Experimtjnt 1 and 2 is a subMt of the stimuli 

task do not reflect exclusively the speed of processing the exminedby Frost Katz, sod Bmtin (1*7) in the lexical decision 

printed words, but also the complexity of processing the snd tt« naming tasks. T^-^ frequency effect obtained by Frost et 

auditory stimuH. Hence, the montorii^ of iwctkm times in this aL (1967) was ever 100 ms Mipporting the assumption that ttie 

task does not necessarily portray the speed of processing the proM^oftekiw-firaqucncy words was indsKi slower. 
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Processing Phonological and Semantic Ambiguity: 
Evidence from Semantic Priming at Different SOAs* 

Ram Frostt and Shlomo Bentint 



Disambiguation of heterophonic and homophonic homographs was investigated in Hebrew 
using semantic priming. Ambiguous primes were followed by unambiguous tai^ets at 100 
ms, 250 ms, and 750 ms SOA. Lexical decision for targets related to the dominant 
phonological alternatives of heterophonic homographs were facilitated at all SOAs. 
Targets related to subordinate alternatives were laciliteted only at SOAs of 250 ms or 
longer. When the primes were homophonic homographs, semantic relationship faciliteted 
lexical decision to targets at all SOAs regardless of the dominance of the meaning to which 
the targete were related* Hiese date can be accoimted for by assuming multiple lexical 
entries for heterophonic homographs, single lexical entries for homophonic homographs 
and phonological mediation of accessing meanings. La n gua g e specific factors probably 
account for the long lasting activation of stibordinate meanings. 



Several studies of lexical disambiguation 
suggested that all the meanings of a homograph 
may be automatically activated. One experimental 
procedure used to demonstrate access to multiple 
meanings is semantic priming. It has been 
reported that homographs embedded in sentences 
facilitate lexical decisions for related targets even 
if these targets are related to meanings which are 
different than those implied by the sentence 
context (e.g., Onifer & Swinney, 1981; Seidenberg, 
Tanenhaus, Leiman, & Bienkowski, 1982; 
Swinney, 1979; Tanenhaus, Leiman, & 
Seidenberg, 1979). These results were interpreted 
as supporting an exhaustive, context-independent 
model of lexical access for homographs, according 
to which, all possible meanings of one homograph 
are retrieved in parallel. An alternative view is 
that contextual information affects lexical 
processing of homographs at an early stage, 
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selecting only meanings which are contextually 
appropriate (e.g., Schvaneveldt, Meyer, & Becker, 
1976; Glucksberg, Kreuz, & Kho, 1986). 

A third approach combines features of both pre* 
vious views into an ordered access model. This 
model posits exhaustive access which does not oc- 
cur in parallel, but is determined by the relative 
frequent of the two meanings related to the am- 
biguous word (e.g., Dufiy, Morris, & Rayner, 1988; 
Forster & Bednall, 1976; Hogaboam & Perfetti, 
1975; Neil, Hilliard, & Cooper, 1988; Simpson, 
1981; and see Simpson, 1984, for a review). 
Hogaboam and Perfetti (1975) have demonstrated 
that whatever the biasing context, the dominant 
meaning of a homograph is retrieved first. 
Evidence for an ordered access was also presented 
by Simpson (1981), who showed that in a nonbias- 
ing context, only targets which were related to the 
dominant meaning of an ambiguous word, were 
primed. Similarly, differential activation of hig^- 
and low-frequenc^ meanings of ambiguous homo- 
graphs was also demonstrated with event related 
potentials (Van Petten & Kutas, 1987), and by 
monitoring eye movem^^ts (Duffy et al., 1988; 
Rayner & Frazier, 1989). 

If more than one meaning of a homograph can 
be retrieved even if it appears in a biasing 
sentence context, multiple-meaning access should 
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be the rule for homographs presented in isolation. 
This hypothesis was confirmed by Holley-Wilcoz 
and Blank (1980), who found that polysemous 
primes (e.g., BANK) facilitated lexical decisions to 
targets related to all of their meanings. Holley- 
Wilcox fc Blank (1980) interpreted their results as 
supporting the parallel-access model* More 
recently however, Simpson and Burgess (1985) 
reported evidence for an ordered access model for 
isolated homographs. They have shown that in the 
case of isolated homographs the most frequently 
used (dominant) meaning is accessed first, while 
the less frequently used (subordinate) meaning is 
accessed relatively later. 

Most studies of lexical ambiguity focused on 
homophonic homographs (i.e., letter strings that 
have a single pronunciation but two or more 
meanings, e.g., BANK). However, homophonic 
homographs are not the only forms of word 
ambiguity. Ambiguity can exist also in the 
relationship between the orthographic and the 
phonologic forms of a word. For example, ^.n 
contrast to ""BANK,* the printed letter string 
*WIND* has two different pronunciations, each of 
which has a different meaning. In a recent study. 
Frost, Feldmau, and Katz (1990) examined the 
effect of phonological ambiguity in Serbo- 
Croatian. Subjects were presented simultaneously 
with printed and spoken words, and were required 
to determine whether they matched. Phonological 
ambiguity was produced using letters which 
represented different phonemes in the Cyrillic and 
Roman alphabets. The results showed that 
matching phonologically ambiguous printed words 
with their spoken realizations was delayed 
relative to the matching of unambiguous printed 
patterns in which only letters unique to one 
alphabet were used. This delay was significantly 
larger when the ambiguous print was matched 
wiih the less frequent spoken alternatives than 
when it was matched with the more frequent 
spoken alternative. Frost et al. (1990) suggested 
that these results support a multiple access model 
in which dominant alternatives reach a higher 
level of activation. The effect of phonological 
ambiguity was examined in English as well. 
Carpenter and Danemau (1981) have 
demonstrated that the duration of eye fixations on 
heterophonic homographs was longer when the 
phonological alternative implied by the semantic 
context was a low-frequency word than when it 
was a high frequenpy word. In a direct comparison 
between heterophonic and homophonic 
homographs, Kroll and Schweickert (1978) foimd 
that heterophonic homographs like Sdnd* take 



longer to name than homophonic homographs. 
These results suggest that in English, as in Seibo- 
Croatian, heterophonic homographs are processed 
differently than homophonic homographs. 
However, both in English and in Serbo-Croatian 
heterophonic homographs form a small and 
perhaps non-representative group of words* 

The unvoweled Hebrew orthography presents an 
opportunity to examine the process of disam- 
biguating the meaning of heterophonic homo- 
graphs. In Hebrew, letters represent mostly con- 
sonants while vowels can optionally be superim- 
poaed on consonants as diacritical marks. In most 
printed material, (except for poetry, holy scrip- 
tures and children's literature), the vowel-marks 
are usually omitted. Since different vowels may be 
added to the same string of consonants to form 
different words, the Hebrew unvoweled print can- 
not specify a unique phonological unit. Therefore, 
a printed letter string is very frequently phonolog- 
ically ambiguous, repres snting more than one 
word, eadi with a different meaning. 

In a previous study (Bentin & Frost, 1987) we 
examined the influence of semantic and phono- 
logic ambiguity on lexical decision and on naming 
isolated Hebrew words. We found that lexical de- 
cisions for unvoweled ambiguous consonant 
strings were faster than for any of the high- or 
low-firequency voweled (therefore disambiguated) 
meanings of the same strings. In contrast, naming 
ambiguous unvoweled words was as fast as nam- 
ing the high-frequency voweled alternative, 
whereas naming the low-frequency alternative 
was significantly slower. On the basis of these and 
previous results (Bentin, Bargai, & Katz, 1984), 
we suggested that lexical decisions for unvoweled 
Hebrew words are generated prior to the process 
of phonological disambiguation, nrobably on the 
basis of orthographic familiarity (cf. Balcta & 
Chumbley, 1984; Chumbley & Balota, 1984, 
Seidenberg, 1985). This suggestion also accommo- 
dates previous data demonstrating that ortho- 
graphic information is used for lexical decisions 
and naming more extensively in Hebrew than in 
other languages (Frost, Katz, & Bentin, 1987). 

In contrast to lexical decision, naming necessar- 
ily requires the selection of one phonological al- 
ternative of the ambiguous letter string. The sig- 
nificant delay in naming the low-frequen^ vow- 
eled alternative relative to the unvoweled and the 
hi|^-frequency forms of the same letter string, led 
us to support the ordeved-access model for the re- 
trieval of phonological information. Consequently, 
we suggested that, when confronted with phono- 
logically ambiguous letter strings, readers retrieve 
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the high-frequency phonological itructure first. 
The naming task, however, cannot diiclose covert 
phonological lelection processes. In particular, 
naming does not reveal whether phonological al- 
ternatives, other than the reader^s final dioice, 
had been accessed during the process of disam- 
biguation. For example, in our previous study, 
subjects overtly expressed only one phonological 
structure, more often the high-frequenpy alterna- 
tive. However, we could not determine whether al- 
ternative words were generated but discarded dur- 
ing the output process, or whether only one word 
was generated from the prir^t Moreover, although 
each phonological form was related to a different 
meaning, naming does not necessarily imply ac- 
cess to semantic information. Therefore, although 
our previous results supported a frequency-or- 
dert5 retrieval of phonological alternatives, a 
more direct measure was necessary to examine 
whether more than one meaning of a heterophonic 
homograph is automatically activated, and 
whether this access is ordered by the relative fre- 
quency of each meaning. 

In the present paper we addressed this question 
using a semantic priming paradigm similar to 
that used by Simpson and Burgess (1985). 
Isolated ambiguous consonant strings were 
presented as primes and the targets were related 
to only one of their possible meanings. We 
assumed that if a specific meaning of the prime is 
initially accessed, lexical decision for targets that 
are related to that meaning should be facilitated. 

A second question addressed in the present 
study refers to the time course of activation of 
dominant and subordinate (i.e., high-and low-fire- 
quency) meanings of phonologically ambiguous 
letter strings. Several studies in English have 
shovm that in a sentence context, the subordinate 
meaning is active only during a limited period of 
time (Seidenberg et al. 1982; Van Petten & Kutas, 
1987). Similar results were found also for isolated 
homographs (Kellas, Ferraro, & Simpson, 1988; 
Simpson & Burgess, 1985). In particular, Simpson 
and Burgess (1985) found that an SOA of 16 ms 
between prime and target was sufficient to facili- 
tate lexical decisions for targets related to the 
dominant meaning, but not for targets related to 
the subordinate meanings. Relatedness to the 
subordinate meaning facilitated lexical decisions 
only when the SOA between prime and target 
ranged from 100 to 300 ms. The fast decay of the 
subordinate meaning was explained in that study, 
by assuming that the limited capacity attention 
system (Neely, 1977), must focus on only one 
meaning, and in the absence of diaaiubiguating 



context, the dominant alternative is usually cho- 
sen (see also Kellas et aL, 1988). However, since in 
Hebrew, several phonological units are activated 
in addition to several semantic nodes, it :s possible 
that the activation of both dominant and subordi- 
nate alternatives lasts longer. This might happen, 
for example, if the retrieval of different phonolog- 
ical units results in more extensive lexical process- 
ing. In the present study we examined this possi- 
bility by manipulating tiie SOA between the am- 
biguous primes and the targets. 

EXPERIMENT 1-A 

In Experiment 1-a we presented siibjects with 
unvoweled heterophonic homographs as primes. 
Applying different vowel patterns, each prime 
could be read both as a high- and as a low- 
frequency word. In fich trial the prime was 
followed by a word or by a nonword target at 100 
ms or 250 ms SOA. Subjects were instructed to 
read the primes silently and to make lexical 
decisions to the targets. Across subjects, each 
target was either unrelated to its prime, or related 
to the dominant or to the subordinate meaning of 
the ambiguous prime. Facilitation of lexical 
decisions in any related condition (relative to the 
unrelated condition) was considered evidence for 
accessing the related meaning of the prime. 

Method 

Subjects. Forty undergraduate students, native 
Hebrew speakers, participated in the experiment 
for course credit or for payment. 

Stimuli. The primes were 40 ambiguous conso- 
nant strings which represented both a high- and a 
low-frequency word. In the absence of a reliable 
frequency cotmt in Hebrew, we estimated the 
subjective frequency of each word using the follow- 
ing procedure: From a pool of 100 ambiguous con- 
sonant strings we generated two lists of 100 vow- 
eled words each. Each list of disambiguated words 
contained only one form of the possible realiza- 
tions of each homograph. Dominant and subordi- 
nate meanings were equally distributed between 
the lists. Both lists were presented to 50 under- 
graduate students, who rated the frequency of 
each word on a 7-point scale from very infrequent 
(1) to very frequent (7). The rated frequencies 
were averaged across all 50 judges. Each of the 40 
homographs that were selected for this study rep- 
resented two words that differed in their rated 
frequency by at least 1 point on that scale. The 
validity of this selection was then tested by nam- 
ing: Twenty four subjects were presented with the 
unvoweled homographs, and their vocal responses 
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were recorded. We measured the relative domi- 
nance of each phonological alternative as reflected 
by the number of times it was actually diosen and 
pro*ioanced by the subjects. Only those homo- 
graphs whose frequent judgments coincided with 
the results obtained in the naming task (i.e., at 
least 66% of the subjects chose to name the phono- 
logical alternative that had a higher IrequenQr 
rate), were used in the experiment 

Two targets were associated to each selected 
homograph. One target was semantically related 
to its dominant meaning, and the other to its sub- 
ordinate meening. The targets were all unam- 
biguous (i.e., even without vowel marks they rep- 
resented only one word). In order to ensure simi- 
lar semantic relatedness for the dominant and the 
subordinate meanings, the semantic relation of 
primes and targets was rated by the same 50 
judges on a 7-point scale, from unrelated (1) to 
highly related (7). The means of those ratings 
were 5.2 for the dominant meanings, and 5.3 for 
the subordinate meanings. Each of the 80 targets 
was also paired with an unrelated prime. The un- 
related primes were 40 heterophonic homographs 
selected from the original pool and dififerent than 
those used in the delated" conditions. Because 
none of their possible readings was related to the 
targets, and because dominance is irrelevant in 
the unrelated condition, the same prime preceded 
the targets used in the dominant and the subordi- 
nate related conditions. Hence, there were only 40 
different ambiguous primes in the unrelated con- 
ditions which were rotated across subjects. In ad- 



dition to the word-word pairs, 80 word-nonwords 
pairs were introduced as fillers. The words were 
heterophonic homographs different than those 
contained in the original pooL The nonwords were 
consonant strings that have no meaning in 
Hd^rew regardless of vowel configuration. An ex- 
ample of related and unrelated prime-target pairs 
is presented in Figure 1. 

DeMign. There were eight experimental condi- 
tions: Different targets were related to the donu- 
nant or to the subordinate meanings of the am- 
biguous primes; each of the related targets was 
also presented in an unrelated condition. In each 
of these four possible pairings, the SOA between 
primes and targets was either 100 or 250 ms. Four 
lists of words were formed: Each list contained 10 
prime-target pairs in each of the eight experimen- 
tal conditions and 80 word-nonwords fillers. The 
prime-target pairs were rotated across lists by a 
Latin Square design: related pairs in one list, 
were unrelated in another list, pairs whidi 
appeared with a prime/target SOA of 100 ms in 
one list, appeared with SOA of 250 ms in another 
list, etc. The purpose of this rotation was to 
present the targets that were related to the 
dominant meanings of the primes and the targets 
that were related to the subordinate meanings of 
the primes, in both the related and the unrelated 
conditions, at all SOAs, yet avoiding repetitions 
within a list. Hence, eadi target word served as 
its own control for the measurement of semantic 
facilitation in an across-subjects design (see 
Figure 1). 



Unvoweled prime 



(MLCH) 



Phonological 
alternatives 



semantic 
meaning 

Cftnditign 

prime 

Jaisst 



Pominant 

MELACH 
"salt" 



Related 
MLCH 



"sugar" 



Unrelated 

KLV("dog") 

"sugar" 



MALACH 
"sailor" 

Related 
MLCH 
"ship" 



Unrelated 

KLV("dog") 

"ship" 



Figutt 1. Example of rtUtcd and unnlalcd primc-taigct pain in unvoweled Hebrew. 
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Procedure and apparatus. The subjects were 
tested individually. They were instrueted to read 
the primes and to make lexical dedsions only for 
the targets by pressing a IVord" or a "nonword* 
response key. The dominant hand was always 
used for *word* responses. All stimuli were 
presented at the center of a Macintosh computer 
screen Q>old Hebrew font, size 24). The subjects 
sat approximately 70 cm from the screen, so that 
the stimuli subtended a horizontal visual angle of 
4 degrees on the average. A trial began with the 
presentation of the prime which was replaced by 
the target at the end of the respective SO A period. 
The target was continuously exposed until a 
response was recorded. The inter-stimulus 
internal was 2500 ms from subject's response to 
the onset of the following prime. Each session 
started with 16 practice trials. The 160 test trials 
were presented in one block. 

Results 

Means and standard deviations of RTs for 
correct responses were calculated for each subject 
in each of the eight experimental conditions. 
Within each subject/condition combination, BTs 
that were outside a range of 2 SDs from the 
respective mean were excluded, and the mean was 
recalculated. Outliers accoimted for less then 5% 
of all responses. This procedure was repeated in 
all six experiments in the present study. 

RTs and errors in the different experimental 
conditions are presented in Table 1. Liexical 
decisions to targets related to the dominant 
meanings of the ambiguous primes were faster 
than to unrelated targets, at both 100 and 250 ms 
SOA. In contrast, lexical decisions to targets 
related to the subordinate meanings were faster 



than responses to unrelated targets only at 250 
ms SOA. At 100 ms SOA, lexical decisions to 
related targets were apparently slower than 
lexical dedsions to unrelated targets. 

The statistical significance of those differences 
was assessed by an analysis of variance (ANOVA) 
across subjects (Fl) and across stimiili (F2), with 
the main factors of semantic relatedness (related, 
unrelated), dominance of prime-meaning 
(dominant, subordinate), and SOA (100, 250 ms). 
The main effects of relatedness, dominance, and 
SOA were significant: RTs to related targets were 
faster ihan to unrelated targets [Fl(l,39)=22.0, 
MSe=l789, p<.001; F2(l,39)=15.7, MSe=2655, 
p<.001]; RTs to tarfiits that referred to the 
dominant meaning (>f the prime in the related 
conditio were faster than RTs to targets that 
referred to the subordinate meaning of the prime 
[Fl(l,39)=14.6, MSe=2373, p<.001; F2(l,39)=5.75, 
MSe=7609, p<0.02]; and KTs at 250 ms SOA were 
faster than at 100 ms SOA [Fl(l,39)=:63.9, 
MSe=2315, p<.001; F2(l,39=27.0, MSe=5319, 
p<.001].^ Relatedness interacted wi^ dominance 
IF1(1,39)=5.62, MSe=2119, p<.001; F2(l,39)=:3.16, 
MSe=3594, p<.08], and with SOA [Fl(l,39)=14.0, 
MSe=1256, p<.001; F2(l,39)=10.5, MSe=2332, 
p<.002]. The interaction of SOA and dominance 
was not significant (Fl, F2 <1.0). The three-way 
interaction was significant in the subject analysis 
[Fl(l,39)=4.0, MSe=2191, p<.051, but only 
approached significance in the stimulus analysis 
i:iP2(l,39)=2.7, MSe=4868, p<0.10]. The three-way 
interaction seems to have resulted in part from 
greater RTs differences between SOAs for 
unrelated dominant primes (37 ms) than for 
imrelated subordinate primes (17 ms). We do not 
have an explanation for this difference. 



Table 1. Reaction times and (percentage of errors) to related and unrelated targets in the different experimental 
conditions with phonologicalfy ambiguous (unvaweled) primes (Experiments 1-a and l-b). 



Domlmint Primes Subordia«te Primes Noawords 



SOA 


100 


250 


750 


100 


250 


750 


1-a 


l-b 


UnreUtod 


715 


678 


692 


718 


701 


714 


754 


778 




(9%) 


(8%) 


(8%) 


(12%) 


(12%) 


(10%) 


(11%) 


(8%) 


Related 


684 


639 


658 


739 


669 


692 








(10%) 


(7%) 


(8%) 


(10%) 


(13%) 


(11%) 






PrlmlBg 


















Effect 


+31 


+39 


+34 


-21 


+32 


+22 
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To elaborate the three-way interaction, and 
because we were concerned with the different 
patterns of facilitation for the dominant and the 
subordinate meanings at the short and the longer 
SOAs, we conducted separate analyses of the 
relatedness and dominance effects at each SOA. 
These respective ANOVAs showed that 
relatedness interacted with dominance at 100 ms 
SOA [Fl(l,39)=13.6, MSe=:1602, p<-001; 
F2(l,39)=8.4, MSe=2905, p<.0061, but not at 250 
ms SOA [Fl, F2 (1,39)<1.0]. A Tukey-A post hoc 
analysis of the interaction at 100 ms SOA 
revealed that the difference between unrelated 
targets and targets related to the subordinate 
meanings of the homographs was not significant, 
whereas, lexical decisions for targets related to 
the dominant meaning of the homographs were 
faster than to unrelated targets. 

The differences in error rates between the 
various experimental conditions did not produce 
significant effects* 

OTERIMENTl-B 

A more complete description of the time course 
of activating the dominant and subordinate 
meanings of heterophonic homographs required 
examination of the semantic priming effects at an 
SOA longer than 250 ms. This condition could not 
be included in the first part of the experiment be- 
cause the total number of stimuli used in our ro- 
tated within-subjects design did not permit an ad- 
ditional division.2 Therefore, this condition was 
examined in a second group of 40 subjects sam- 
pled from the same population of undergraduates 
as in Experiment 1-a. 

The stimuli, design, and procedure were similar 
to those used in Experiment 1-a, except that the 
SOA between primes and targets was 750 ms. To 
make the structure of the stimulus lists as similar 
as possible to the previous experiment, we 
introduced as fillers an identical number of 
heterophonic homographs with a shorter SOA of 
250 ms. Moreover, because subjects encountering 
only long delays between primes and targets 
might actively invoke the two phonologic 
alternatives, whereas subjects encountering both 
long and short delays might not, a second purpose 
of the fillers with the shorter SOAs was to prevent 
subjects from developing this search strategy. 

Results 

RTs were faster for related targets than for 
unrelated targets, and for targets related to the 
dominant meaning of the prime than for targets 
related to the subordinate meaning (Table 1). The 



statistical significance was assessed in a two-way 
analysis of variance across subjects (Fl), and 
across stimuli (F2). The main factors were 
semantic relatedness (related, unrelated), and 
dominance of prime-meaning (dominant, 
subordinate). The ANOVA showed that both main 
effects were significant: [Fl(l,89)=24.3, 
MSe«1286, p<0.001, and F2(1.39)sl4.6, 
MSesl765, p<0.001 for semantic relatedness, and 
Fl(l,39)«10.8, MSe«3123, p<0.002, and 
F2(l,39)=11.6, MSe=3378, p<0.002, for dominance 
of the prime-meaningL The interaction of the two 
factors was not significant [Fl, F2 (1,39)<1.01. 
Planned comparisons revealed that RTs to targets 
related to the subordinate alternatives of the 
prime-meanings were significantly faster than in 
the unrelated condition U(l»39)=:2.54, p<0.011. The 
pattern of semantic facilitation obtained for the 
fillers with 250 ms SOA was similar to the pattern 
obtained with the identical SOA in Experiment 1- 
a (33 ms facilitation for targets related to the 
dominant meaning, and 20 ms for targets related 
to the subordinate meanings). 

Discussion 

The results of Experiments 1-a and 1-b suggest 
that meanings of isolated heterophonic homo- 
graphs were retrieved as predicted by an ordered- 
access model. The meaning of the dominant 
phonological alternative was accessed faster than 
that of the subordinate phonological alternative. 
However, the time course of activating the subor- 
dinate meanings was different from that found 
with English homophonic homographs (Simpson & 
Burgess, 1985) in several ways. The subordinate 
meanings in Simpson and Burgess's study have 
been already activated at 100 ^s, and decayed af- 
ter 300 ms from stimulus onset In contrast, the 
meanings of subordinate phonological alternatives 
in the present study was not available at 1(M) ms. 

Hie subordinate alternatives were active at 250 
ms and, in contrast to English, they were still 
available as late as 750 ms from stimulus onset. 
Hence, the present data suggest that subordinate 
meanings of heterophonic homographs are 
accessed slower than the subordinate meaning of 
polysemous words, but they remain active for a 
longer time. 

The divergence between the time course of dis- 
ambiguating Hebrew heterophonic homographs 
and English homophonic homographs might re- 
flect language-related differences or, alternatively, 
basic differences in processing heterophonic and 
homophonic homographs. However, before going 
any further in speculating about mechanisms of 
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disambiguation of homographs, it was important 
to make sure that the dominant and subordinate 
forms of the present stimuli were equivalent in 
their efficiency to prime their respective targets. 
To control for differences in accessing dominant 
and subordinate meanings in absence of phonolc^* 
ical ambiguity, and to understand better the inde- 
pendent relationship between the dominant and 
the subordinate phonological alternatives of one 
letter string and their respective meanings, a sec- 
ond experiment was conducted. In the second ex- 
periment we examined the pattern of semantic fa- 
cilitation of targets related to eadi meaning, when 
the phonological units to which they were related 
were presented in a disambiguated fomL 

EXPERIMENTS 2-A AND 2-B 
The interpretation of the apparently ordered re- 
trieval of the subordinate and the dominant mean- 
ings of the phonologically ambiguous letter strings 
presented in Experiments 1-a and 1-b was based 
on the relative magnitude of priming effects. This 
interpretation assumed that the observed differ- 
ence between dominant and subordinate meanings 
of the primes is accounted for by their phono- 
logical ambiguity. In other words, it was assumed 
that in a disambiguated form, the subordinate and 
the dominant primes would have primed their re- 
spective targets equally. The purpose of 
Experiment 2 was to test this assumption. 

Hebrew provides a unique opportunity to 
compare semantic priming effects involving 
alternative meanings of homographs with the 
semantic priming effects involving the same words 
presented explicitly, i.e., in a non-ambiguous form. 
In contrast to homophonic homographs that can 
be disambiguated only by semantic context (for 



example by embedding the homograph in a 
sentence), Hebrew heterophonic homographs can 
be disambiguated and still be presented as 
isolated words. This can be achieved by adding the 
diacritical dots to the ambiguous letter strings. 
The advantage of this procedure is that the 
experimental structure and the priming 
conditions remain constant for the ambiguous and 
unambiguous presentations. 

Method 

Subjects. Eighty undergraduate students, all 
native Hebrew speakers, participated for course 
credit or for pa3rment. None of the subjects 
participated in the previous experiments. As in 
the previous experiments, 40 subjects were tested 
with prime/target SOAs of 100 ms and 250 ms 
(Experiment 2-a), and the other 40 with 750 ms 
SOA (Experiment 2-b). 

Stimuli, design, and procedure. The stimuli, 
experimental design, and procedure were identical 
to those used in Experiments 1-a and l*b, except 
that all the words and nonwords were presented 
in coAjunction with vowel marks. Thus, each word 
was presented in an unequivocal phonological 
form, and had only one meaning. 

Results 

At all SOAs and with both dominant and 
subordinate primes, RTs to related targets were 
faster than RTs to unrelated targets (Table 2). 

The statistical significance of the priming effects 
at 100 ms and 250 ms SOAs in Experiment 2-a 
was assessed by ANOVA across subjects (Fl) and 
across stimuli (F2). The main factors were 
semantic relatedness (related, unrelated), 
dominance of prime (dominant, subordinate), and 
SOA (100 ms, 250 ms). 



Tabk 2, Reaction times and (percentage of errors) to related and unrelated targets in the different experimental 
conditions with phonologically unambiguous (voweled) primes (Experiments 2-a and 2-bh 



Dominapt Primes 



Subordinate Primes 



Nonwords 



SOA 


100 


250 


750 


100 


250 


750 


241 


2-b 


Unrdated 


722 


681 


716 


746 


702 


725 


767 


765 




(8*) 


(10%) 


(7%) 


(9%) 


(12%) 


(8%) 


(9%) 


8%) 


RekUd 


690 


634 


672 


703 


664 


683 








(8%) 


(8%) 


(8%) 


(8%) 


(8%) 


(6%) 






Primliig 
Effect 


+32 


447 


444 


+43 


4-38 


442 
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The ANOVA showed that acrosB SOAs, BTs to 
targets in the related condition were faster than 
in the unrelated condition [Fl(l,39>=68.8, 
MSe=:1852, p<.001; F2(l,39)=55.5, MSe=2194, 
p<.001], RTs to targets related to dominant 
primes were faster than to targets related to sub- 
ordinate primes [F 1(1, 39)= 18.6, MSe=2013, 
p<.001;F2(l,39)«6.3, MSe«5491,p<.011, atu RTs 
werd faster at 25C ms SOA than at 100 ms SOA 
[Fl(l,39)=37.9, MSe=4223, p<.001; F2(l,39)=158, 
MSe»1123, p<.001]. However, in contrast to 
Experiment 1-a, none of the interactions were sta- 
tistically significant (Fl, F2<1.0; for Relatedness 
by Frequency; (Fl<i.O, F2=:1.3 for Relatedness by 
SOA; Fl, F2<1.0\ for Frequency by SOA, and 
Fl=l F2s:1.0, for the three-way interaction). 

The analysis of the priming effects at 750 ms 
SOA in Experiment 2-b, revealed a significant 
effect of Semantic relatedness [Fl(l,39)s44.3, 
MSe=1689, p<.001, F2(l,39)=50.1, MSe«1639, 
p<.001), and no main effect of Frequency of the 
prime [F 1(1,39)=2.8. MSe=1432, p>.09; 
F2(l,39)=:1.4, MSe=2692, p>.19). The interaction 
between the two factors was not significant (Fl, 
F2<1.0). The effects of semantic facilitation 
obtained with the fillers at 250 ms SOA in 
Experiment 2-b, were very similar to the effects 
obtained with targets at the same SOA in 
Experiment 2-a (49 ms for targets related to the 
dominant alternatives, and 47 ms for targets 
related to the subordinate alternatives). 

Discussion 

The absence of an interaction between semantic 
priming and the frequency of the prime revealed 
that, in disambiguated form, the dominant and 
the subordinate phonological alternatives of the 
heterophonic homographs were equally effective 
in facilitating lexical decisions to related targets. 
In addition, the results of Experiments 2-a and 2-b 
showed that the time course of processing high- 
and low-frequency unambiguous Hebrew words 
was similar. Hence, Experiments 2-a and 2-b sug- 
gest that the difference in processing dominant 
and subordinate alternative meanings of hetero- 
phonic homographs observed in Experiments 1-a 
and 1-b was, indeed, caused by the ambiguous na- 
ture of the primes that were both phonologically 
and semantically equivocal. 

Although the primes in Experiments 2-a and 2-b 
were unambiguous, an effect of dominance was 
obtained. Targets related to the dominant 
phonological alternatives incurred faster RTs than 
targets related to the subordinate phonological 
alternatives. Because in ExperimenU 2-a and 2-b 



the primes were unequivocal, this effect should be 
considered as a pseudodominance effect. This 
outcome might have resulted from our design in 
which different targets followed identical 
ambiguous primes. Conseqimntly, the comparison 
across dominant and subordinate categories 
involved different target words. It is possible that 
there were intrinsic decision time differences 
between the target words, sudi that targets that 
happened to be related to the dominant 
alternatives were accessed faster than targets 
related to the subordinate alternatives. Howevtr, 
since the conclusions concerning semantic 
facilitation depend on the interaction within 
prime categories (comparing RTs to the same 
target in related vs. unrelated conditions), the 
pseudodominance effect has no theoretical 
importance. 

In Experiments 3-a and 3-b we sought to 
examine the possible sources of the differences 
between the time course of activation found with 
English homophonic homographs (e«g., Simpson & 
Burgess, 1985), and between our present results 
with Hebrew heterophonic homographs. We 
endeavored to isolate the effects of semantic and 
phonologic ambiguity and to control for possible 
language specific factors. For this purpose, we 
have used the design of Experiment 1 with a new 
set of stimuli. These were Hebrew homophonic 
homographs, i.e., words like •BANK,* that have 
two meanings but only one pronunciation. 

EXPERIMENTS 3-A AND 3-B 
Experiments 3-a and 3-b examined the time 
course of activation of dominant and subordinate 
mea'^;^ings of Hebrew homophonic homographs. 
Eadi stimulus was a pattern of letters represent- 
ing only one word (one phonological unit); that 
word, however, had two meanings, one more fre- 
quent than the other. Consequently, like most 
English homographs, these stimuli were semanti- 
cally ambiguous but phonologically unequivocal. 
Using exactly the same design as in the previous 
experiments, the present experiments allowed 
comparison of homophonic and heterophonic ho- 
mographs within one language - Hebrew. 

Method 

Subjects. The subjects were 120 undergraduates, 
native Hebrew speakers. They participated in the 
experiments for credits r / payment Sixty sul^ects 
participated in Experiment 3-a, and 60 
participated in Experiment 3-b. 

Stimuli. The primes were 36 ambiguous 
homophonic homographs that were selected from 
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a pool of 120 homographs. Each sele:*ted word had 
a dominant and a subordinate meaning. 
Dominance was determined empirically by the 
following procedure: 50 subjects rated the 
frequenpy of the meanings of all homographs on a 
7-point scale from very infrequent (1) to very 
frequent (7). Because naming could not 
distinguish between meanings of homophonic 
homographs, the rated frequencies were validated 
differently than in Experiment 1. A group of 32 
subjects were read a list containing only 
homophonic homographs, the meanings of which 
were rated at least 1 point apart on the frequency 
scale. These words were read one at a time. The 
subjects responded verbally with their first 
association to each word. The meaning that the 
subjects had in mind was inferred from their 
response. Dominant meanings were those that 
were produced by at least 66% of the subjects, and 
subordinate meanings were those that were not 
produced by more than 33% of the subjects. Each 
prime was paired with two target words: One was 
semantically related to the dominant meaning and 
the other to the subordinate meaning. Thirty-six 
additional homophonic homographs from the same 
pool were used to form semantically unrelated 
pairs. In addition to the word-word pairs, 72 word- 
non words pairs were again introduced as fillers. 
The words were homophonic homographs that 
were taken from the original pooL The 72 
nonwords were taken from Experiment 1-a. 

Design and procedure. The design of 
Experiments 3-a and 3-b was identical to that of 
Experiments 1-a and 1-b. One group of 60 subjects 
were tested using SOAs of 100 ms and 250 ms 
between primes and targets. Fifteen subjects were 



assigned to eadi of four lists, structured exactly as 
in Experiment 1-a (except that in each list there 
were 9 targets rather than 10 in each condition). 
Across lists, each target appeared in both related 
and unrelated conditions, and at both SOAs. 

The second group of 60 subjects was tested 
using the same stimulus lists, with a design 
identical to Experiment 1-b, that is with the 
longer SOA (750 ms). Although separate analyses 
were conducted in each group, we will report all 
the results in one section. 

Results and Discussion* 

The RTs in the related condition were faster 
than in the unrelated condition at all SOAs, for 
dominant as well as for subordinate targets 
(Table 3). 

Separate ANOVAs were conducted to assessed 
the reliability of the priming effects across 
subjects (Fl) and across stimuli (F2), at 100 ms 
and 250 ms SOAs (Experiment 3-a). 
These ANOVA showed that across SOAs, RTs to 
targets in the related condition were faster than 
in the unrelated condition [Fl(l,59)=19.7, 
MSe=1957, p<,001; F2(l,35)=:15.6, MSe=:1690, 
p<.0011, RTs to targets related to dominant 
primes were faster than to targets related to 
subordinate primes [Fl( 1,59)^14.6 MSesl675, 
p<.001; 1^2(1,35)^4.5 MSe=5166, p<.04], and RTs 
were faster at 250 ms SOA than at 100 ms 
SOA [Fl(l,59)=:145, MSe=:2035, p<.001; 
1J^(1,35)=:250. MSe=675, p<.0011. As with 
unambiguous primes in Experiments 2-a and 2-b 
and in contrast to Experiments 1-a and 1-b, 
semantic relatedness did not reliably interact with 
any other factor. 



TaWe 3. Reaction limes and (percentage of errors) to related and unrelated targets in the different experimental 
conditions with homophonic homographs as primes (Experiments 3-a and S-b). 



Domlnaiit Primes 



Subordinate Primes 



Nonwords 



SOA 


100 


2S0 


750 


100 


250 


750 


3-a 


3-b 


Unrdatod 


591 


545 


567 


606 


561 


580 


680 


653 




(6%) 


(8%) 


(6%) 


(7%) 


(8%) 


(7%) 


(9%) 


(8%) 


Related 


580 


522 


553 


588 


540 


570 








(6%) 


(6%) 


(6%) 


(6%) 


(8%) 


(8%) 
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^11 


+23 


+14 


+18 


+21 


+10 
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The analysis of the priming effects at 750 ms 
SOA (Experiment showed that the semantic 
relatednesB effect was reliable [Fl(l,59>=7.9, 
MSe=1098, p<.007; F2(l,35)=7.6, MSe=694, 
p<.009] and BTs to targets were faster following 
dominant primes than following subordinate 
primes [Fl(l,59)=10.6, MSe=1265, p<.002; 
F2(l,35)s3.9 MSe=r3323» p<0.05]. As with the 
shorter SOAs, the interaction between the two 
factors was not reliable (Fl, FZkl.O). 

The most important finding in Eaperiments 3-a 
and 3-b was that, in absence of phonological 
ambiguity both the dominant and the subordinate 
meanings of Hebrew polysemous words were 
already available at 100 ms from stimulus onset. 
Similarly to heterophonic homographs, they 
remained active at least during the first 750 ms. 
These results suggest that the distinct pattern of 
activation observed for low- frequenpy phonologi- 
cal alternatives of heterophonic homographs (in 
Experiment 1-a) was caused by phonological 
rather than semantic ambiguity. 

Because our study did not include a condition of 
very short SOA (16 ms) between primes and 
targets, the onset of activating dominant and 
subordinate meanings of Hebrew homophonic 
homographs cannot be directly compared to the 
pattern of activation reported by Simpson and 
Burgess (1985) with English materials. However, 
the persistent activation of subordinate meanings 
at the longer SOA of 750 ms in the present 
experiment, clearly differs from the pattern of 
activation observed in English (Simpson & 
Burgess, 1985). This divergence suggests that the 
process of disambiguating polysemous words 
might involve language-specific components. 
Possible interpretations of these results are 
elaborated in the general discussion. 

Across experiments compari'^ons 

Several formal comparisons were conducted to 
assess priming effects involving heterophonic 
primes at all SOAs (Experiments 1-a and 1-b) and 
priming effects involving homophonic primes 
(Experiments 3-a and 3-b). For these analyses the 
relevant data from the four experiments were 
combined in mixed ANOVA designs in which the 
type of homographs was introduced as an 
additional between-subjects factor. First, we 
compared the pattern of semantic facilitation of 
the subordinate meanings only, across all SOAs, 
for the two types of homographs. The three-way 
interaction of relatedness, SOA, and homograph 
type was significant (Fl(l,98)=6.9, MSe=1914, 



p<0.009; F2(l,74)=3.6, MSe=2926, p<0.36), 
suggesting a reliable difference in the time course 
of activating the subordinate meanings of 
heterophonic and homophonic homographs. 

Another finding regarding the two types of 
homographs was that the average effects of 
semantic priming of the dominant alternatives 
across all SOAs, were twice as strong for 
heterophonic homographs (35 ms facilitation), 
than for homophonic homographs (16 ms 
facilitation). The statistical significance of this 
difference was assessed by a mixed ANOVA 
design in whidi BTs to the dominant meanings of 
heterophonic homographs at all three SOAs, were 
compared with the respective BTs to the dominant 
meanings of homophonic homographs. The type of 
homography served again as a between subjects 
factor. This analysis revealed a significant 
interaction of relatedness and homography type 
(Fl(l,98)= 7.6, MSc=1675, p<0.007; J?2(l,74)=6.0, 
MSe=:1994, p<0.02). Whether the shrinking of the 
priming effect for homophonic homographs 
relative to heterophonic homographs reflects 
primarily differences in processing the two types 
of homographs, or merely a floor effect due to 
much fast.er responses to homophonic than 
heterophonic homographs, was not clear. 
Therefore, we replicated Experiment 1-a using an 
identical number of subjects and identical 
methods. 

The purpose of replicating Experiment 1-a was, 
in fact, two-fold. First, because the comparison of 
heterophonic and homophonic homographs was 
based on a different pool of subjects, and because 
the most important difference relied on one data 
point, we aimed at recvxamining the absence of 
priming effect (or the possible inhibition) for 
heterophonic homographs at 100 ms SOA. Second, 
to examine whether the larger priming effects 
found for heterophonic relatively to homophonic 
homographs were due to an incidental overall 
slower performance of the subjects sampled in 
Experiment 1-a. 

The results of this replication are presented in 
Table 4. As in the original experiment, lexical de- 
cisions for targets related to the subordinate 
meanings of the primes were not facilitated at 100 
ms SOA. In addition, the nonsignificant trend of 
inhibition observed in this condition in 
Experiment 1-a proved to be unreliable. Overall, 
the RTs in the replication were faster than the 
original experiment. This suggests that the sub- 
jects employed in Experiment 1-a were generally 
slower than all other subjects in this study. 
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Table 4. Reaction times and (percentage of errors) to related and unrelated targets in the replication of Experiments 
1^ 

DomiBant Primes Subordinate Primes Nonwords 

SOA 100 2S0 100 250 

Umlatod 626 588 635 609 677 

(9%) (10%) (13%) (11%) (8%) 

Related 601 557 635 588 

(8%) (5%) (10%) (10%) 



PrimiBg 

Effect +25 +31 



Nevertheless^ the pattern of the semantic facilita- 
tion with unvoweled ambiguous heterophonic ho- 
mographs was replicated. The statistical signifi- 
cance of the priming effects was assessed by 
ANOVA across subjects (JPl) and across stimuli 
{F2j. The main effects of relatedness and domi- 
.^ance were significant [Fl(l,39)=4,6, MSe=1436, 
;?<0,04; F2(l,39)=:5.0, MSe=:1834, p<0,03; 
Fl(l,39)= 12.6, MSe=1552, p<0.001; F2 (1,39);= 
9.9, MSe=:2648, p<0.003; respectively]. The two- 
way interaction did not reach significance in the 
stimuli analysis [Fl(l,39)=3,6, MSe=1741, p<0.06; 
F2(l,39)=2.4, MSe=l756,p<0.1]. Planned compar- 
isons revealed that RTs to targets related to the 
subordinate alternatives of the prime-meanings at 
250 ms SOA were significantly faster than RTs to 
targets in the unrelated condition [t(l,39>=3.1, 
p<0.0041. We will refer to the additional implica- 
tions of this replication in the General Discussion. 

GENERAL DISCUSSION 
In the present study we examined the process of 
disambiguating Hebrew heterophonic and homo- 
phonic homographs presented in the absence of 
biasing context. To summarize the results of our 
investigation, it appears that regardless of relative 
dominance, at least two different meanings of 
each homograph were retrieved. However, the 
time-course of activating the different meanings 
and possibly the amount of activation were influ- 
enced by phonological factors. With homophonic 
homographs, subordinate as well as the dominant 
meetings were active as early as 100 ms from 
stimulus onset. On the other hand, with hetero- 
phonic homographs, only the dominant meaning 
was available at 100 ms SOA, whereas the avail- 
ability of the subordinate meaning was delayed. 
In contrait to the differences found at the onset of 



0 +21 



meaning activation, the decay of activation of sub- 
ordinate meanings of homophonic and hetero- 
phonic homographs was similar; they all remained 
active as late as 750 ms from stimulus onset. 
Thus, the onset activation pattern oi Hebrew het- 
erophonic homographs obseirved in the present 
study is in agreement with the ordered-access 
model suggested by Simpson and Burgess, (1985). 
At present this conclusion must be limited to het- 
erophonic homographs because unlike Simpson 
and Burgess (1985), we did not use SOAs shorter 
than 100 ms. Our findings suggest then, that het- 
erophonic homographs and homophonic homo- 
graphs are disambiguated differently. This differ- 
ence and the long lasting activation of subordinate 
meanings of Hd)rew but not English homographs, 
may provide some insights regarding the lexical 
structure and the process of word identification. 

The lexical representation of homophonic 
homographs is controversial. Some authors assert 
that homophonic homographs entertain different 
lexical entries, one for each meaning (Forster & 
Bednall, 1976; Jastrembski, 1981; Kellas et al., 
1988). Other authors claim that a homograph has 
only one lexical entry, related to multiple nodes in 
a semantic network (Seidenberg et al., 1982; 
Cottrell & Small, 1983). On the other hand, 
heterophonic homographs are, by definition, 
represented by several phonological units in the 
lexicon. Thus, phonologically ambiguous letter 
strings refer to different lexical entries, one for 
each phonological realization. The relatively 
delayed access to the subordinate meanings of 
heterophonic homographs, as compared with 
subordinate meanings of homophonic homographs 
could be more easily accounted for by assuming 
only one lexical entry for homophonic homographs 
and several entries for heterophonic homographs.^ 



ERIC 



213 



202 



Frost and Bentin 



Accordinir to such a model, the alternative lexical 
entries are automatically activated by the unique 
orthogr^hical pattern, thou|^ at different onset 
times. The present data and the results of our 
previous studies (e.g., Bentin & Frost, 1897) 
indicate that, in the absence of biasing context the 
order of activation is determined by the relative 
word frequency; higher-frequency words are 
accessed before lower frequency words. As a 
consequence of the multiple entries structure and 
the ordered-access process, heterophonic 
homographs are phonologically disambiguated 
before the semantic network is accessed. Each 
activated word (in the lexicon) is unequivocally 
related to a meaning. Because entries of dominant 
words are accessed before those of subordinate 
words, the origin of the dominance effect on the 
time course of activating the meanings of a 
heterophonic homograph could have been the well 
documented frequent effect on lexical access. 

This interpretation may also account for the 
overall greater priming effects found for 
heterophonic than for homophonic homographs. It 
might suggests that when one lexical unit 
activates two or more semantic nodes, each of 
these nodes is activated less than nodes which are 
unequivocally related to phonological units in the 
lexicon. If, in contrast to heterophonic 
homographs, homophonic homographs were 
represented by only one lexical entry which is 
related to several semantic nodes, the process of 
disambiguating the different meanings should 
have been less affected by the relative frequency 
(dominance) of using each meaning. Our 
hypothesis is that activating a lexical entry in an 
unbiased semantic context, should automatically 
initiate the retrieval of all its related meanings. 
Because only one lexical entry is active the 
relative dominance of the alternative meanings is 
irrelevant at the stage of lexical access. Relative 
frequency factors might affect the order of their 
retrieval at later processing stages, but our results 
suggest that, at least for the SOAs that have been 
examined in the present study, such an effect was 
not observed. 

One caveat that must be considered while 
interpreting the difference in the amount of 
priming with homophonic vs. heterophonic 
homographs is that the former were overall faster 
than the latter. The reduction in overall RTs 
latencies in the replication of Experiment l*a 
relative to the original experiment, and the 
comparison of the nonword data across all 
experiments help to clarify this issue. Because the 
RTs to nonwords in the replication were identical 



to those in Experiment 3*a, we can assume that 
these two groups of subjects were c/3mparable in 
overall speed of performance. Nevertheless, RTs to 
targets related to heterophonic homographs were 
slower by about 40 ms than RTs to targets related 
to homophonic homographs. This difference was 
not entirely unexpected; it conforms with previous 
finding in Hebrew showing faster RTs for 
phonological unequivocal words than for 
phonologically ambiguous words (Bentin et al. 
1984). However, this pattern might have caused a 
floor effect in the RTs to homophonic homographs 
that attenuated the absolute magnitude of the 
priming effect. A floor effect as a sole explanation 
for this attenuation is not entirely supported by 
the data. Note, that although the overall 
difference in RTs in the two homographic 
conditions was reduced by a factor of 3 (from 120 
to 40 ms) when the replication rather than the 
original Experiment was considered, the 
respective reduction of the priming effect for the 
dominant alternatives was relatively small (from 
35 ms to 28 ms). Second, the smallest effect of 
semantic facilitation for homophonic homogrs^hs 
(11 ms) was obtained for dominant primes at 100 
ms SOA that were slower by 50 ms than the 
primes at 250 ms SOA (which revealed a much 
larger facilitation). Thus, the observed difference 
in the magnitude of the priming effects between 
homophonic and heterophonic homographs is 
consistent with the hypothesis that they have 
different lexical structures. 

In addition to the implications on the lexical 
structure, our data are also relevant to arguments 
regarding the use of phonology in word 
identification. On * class of models suggest that 
(with the possible exception of very infrequent 
words), printed words activate orthographic units 
that are directly related to meanings in semantic 
memory (e.g., Seidenberg, 1985; Seidenberg, 
Waters, Barnes, & Tanenhaus, 1984). Such a 
mechanism had been invoked to explain, for 
example, how homophones such as SALE and 
SAIL are correctly understood, or how patients 
with acquired dyslexia can understand written 
words without being able to read them aloud (Kay 
& Patterson, 1985). 

An alternative class of models asserts that, most 
of the time, access to meaning is mediated by 
phonology (e.g., Perfetti, Bell, & Delaney, 1988; 
Van Orden, Johnston, & Hale, 1988). The latter 
class of models is supported by theoretical 
considerations such as tiie parsimony of having 
only one mechanism that mediates access to 
meaning for both speech and reading (e.g.. 
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Liberman & Mattingly, 1989), and by evidence 
that when the ability to derive phonology from 
print is poor (as in deep dyslexia), semantic errors 
in reading are abundant (see for a review 
Marshall & Newoombe, 1980). Moreover, when the 
direct connection from orthography to meaning is 
impaired (as in some patients with surface 
dyslexia), the meaning of printed words can be 
retrieved by pre-lexical application of grapheme- 
to-phoneme (GTP) transformation rules 
(Coltheart, Masterson, Byng, Prior, & Riddoch, 
1983; Marshall & Newcombe, 1973). 

As we have pointed out in previous papers, lexi- 
cal decisions for unvoweled Hebrew words are 
based primarily on orthographic codes (Bentin et 
al., 1984; Bentin & Frost, 1987), and even for 
naxxiing, pre- lexical word phonology is not usually 
used by skilled readers (Frost et al., 1987). 
Nevertheless, the present results suggest that, in 
contrast to lexical decisions, the retrieval of mean- 
ing requires the activation of the phonological 
structure to which the printed word refers. If 
meaning were retrieved directly from the ortho- 
graphic input, no difference should have been 
found between processing homophonic and het- 
erophonic homogra* . The delayed onset of acti- 
vating the meanings the subordinate phonologi- 
cal alternatives relative to the subordinate mean- 
ings of homophonic homographs, and possibly the 
overall more robust priming effects observed when 
the primes were phonologically ambiguous than 
when they were homophonic homographs, sug- 
gests that the former involved phonological dis- 
ambiguation prior to the disambiguation of 
meaning. 

One of the most intrigxiing results of the present 
study was that subordinate meanings of both het- 
erophonic and homophonic homographs were still 
available and used 750 ms from stimulus onset. 
This result contrasts with the relatively fast decay 
of subordinate meanings of English homographs 
(Simpson & Burgess, 1985; Kellas et al., 1988). 
Because the decay pattern was similar for both 
types of Hebrew homographs, the divergence from 
English should be probably accounted for by lan- 
guage-related factors. One possible source of the 
different results obtained in Hebrew and in 
English may be related to the homographic char- 
acteristics of the Hebrew orthography. Hebrew, 
like other Semitic languages, is based on word 
families derived from tri-consonant roots. The root 
is contained in all of its derivations, therefore, 
Hebrew contains many homophonic and hetero- 
phonic homographs. The wide spread of homogra- 
phy might have shaped the reader's reading 



strategies. Because ambiguity is so prevalent in 
reading, the process of semantic and phonologic 
disambiguation is governed mainly by context. As 
the disambiguating context often follows rather 
than precedes the ambiguous homographs, the 
most efficient strategy of processing them should 
consist of maintaining their phonologic or seman- 
tic alternatives in working memory until the con- 
text selects the appropriate one. Note that by this 
interpretation the subordinate alternatives do not 
decay automatically, but remain in memory imtil 
disambiguation by context has occmred. However, 
a complete account of the specific characteristics 
of the Hebrew orthography which might have in- 
fluenced our results with homophonic and hetero- 
phonic homographs deserves further investiga- 
tion. 
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FOOTNOTES 

^Journal of Experimental Psychology: Learning, Memory, and 
Cognition, 18,58^. 

^Department of Psychology, The Hebrew University, Jerusalem. 

^Sunpson and Burgess (1985) found no difference in RTs be^vecn 
SOAs of 100 and 300 ms. However, throughout the present 
study, the main effect of SOA was quite reliable and robust 
(including in the replication of Experiment 1-a), suggesting that 
unlike S and B, in the prttent study the 100 ms SOA condition 
was more difficult than tiie other SOA conditions. A possible 
explanation of this difference is that our procedure did not 
include an initial fixation point 

^Althous^ phonologic ambiguity is very common in the H^irew 
orthography, the set of stimtdi used in the experiments was 
constnincd by many e)q>erimental controls such as mean rated 
frequencies, dominance as reflected by naming performance, 
syntactic classes, rated semantic relatedness, etc. This set of 
stimuli did iK>t pennit a within^byect design across all SOAs. A 
similar problem was raised and solved similarly by Simpson and 
Burgess (1985). 

^Seidenberg et al. (1982) present a similar kind of single vs. 
multiple argument for noun-noun vs. noun-verb homophonic 
homographs, but they draw slightly different inferences. They 
argue that noun-verb homographs (e.g., train) have different 
entries in the lexicon and, hence, both meanings are always 
accessed for such words even when a strong priming word is 
presented in the context In contrast, for noun-noun ambiguities 
(e.g., boxer) there is only one entry in the leidcoxv and meanings 
are accessed in order of relative activation kvels. 
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Attention Mechanisms Mediate the Syntactic Priming 
Effect in Auditory Word Identification* 

Avital Deutscht and Shlomo Bentint 



The effect of lyntactic priming and the involvement of attention in that process was 
investigated testing identification of spoken Hebrew words presented in sentences. Target 
words were masked by white noise and were either congruent or incongruent with the 
syntactic structure of the sentence. In comparison to a neutral condition, the identification 
of congruent targets was facilitated and identification of incongruent targets was 
inhibited, equally. When congruent and incongruent sentences were presented in separate 
blocks the inhibition efifect was attenuated whereas the facilitation was not affected. The 
introduction of S50 ms silent ISI between the context and the target increased the 
inhibition without affecting the facilitation. We suggest that the facilitation as well as the 
inhibition effects of S3mtactic priming are based on a veiled controlled process of 
generating expectations. The inhibition is caiued by an additional controlled process of re- 
evaluation of the auditory input triggered by syntactic incoherence. The later process 
requires additional attentional resources. 



There is much evidence in the research 
literature that syntactic context influences the 
process of word recognition (Carrello^ Lukatela, & 
Turvey, 1988; Goodman, McClelland, & Gibbs, 
1981; Gurjanov, Lukatela, Moskovljevid, & 
Turvey, 1985; Katz, Boyce, Goldstein, & Lukatela, 
1987; Lukatela, Kostid, Feldman, & Turvey, 1983; 
Lukatela & Moraco, Stojonov, Savid, Katz, & 
Turvey, 1982; Marslen-Wilson, 1987; Seidenberg, 
Waters, Sanders, & Langer, 1984; Tanenhaus, 
Leiman, & Seidenberg, 1979; Tyler & Wessels, 
1983; West & Stanovich, 1986; Wright & Garret, 
1984). The common finding is Uiat performance is 
faster and more accurate if the target words are 
congruent with the syntactic structure into which 
they are integrated, than when they are 
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incongruent* This differential performance was 
found mostly in tasks such as lexical decision and 
naming (Carello et al., 1988; Goodman et al., 
1981; Katz et aL, 1987; Guijanov et al., 1985; & 
Lukatela et al., 1982; 1983; Seidenberg et al., 
1984; Tanenhaus et al., 1979; West & Stanovich, 
1986; Wright & Garrett, 1984). In analogy with 
the effects of semantic context in similar tasks, 
the influence of the syntactic context has often 
been labeled grammatical or syntactic priming. 
However, because the term priming has been 
borrowed from the semantic domain, the use of 
the term "priming" in the syntactic domain needs 
specific consideration. In the semantic domain, 
priming refers primarily to a process that 
influences the identification of a particular lexical 
entry (Forster, 1981; Seidenberg, 1982). The 
syntactic context, on the other hand, refers 
primarily to a particular grammatical form of the 
word, which may or may not be independently 
represented in tiie lexicon. Therefore Sjmtactic 
priming may affect the identification of a 
particular grammatical structure, without direct 
influence on accessing a particular lexical entry. It 
is in this sense that we will adopt here the term 
syntactic priming. 
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Like other priming phenomena, syntactic 
priming might also reflect the combined or 
independent contribution of two baric components: 
One is the facilitation of processing syntactically 
congruent targets due to the agreement between 
the observed grammatical form and that predicted 
by the syntactic structure. The other is the 
inhibition of processing incongruent targets either 
becr-^se they do not conform with previous 
expc«.tation» or because they may require 
additional processing aimed at resolving the 
amorphic input, or both. Several studies have 
interpreted syntactic priming in terms of 
facilitation (Katz et al., 1987; Lukatala et al., 
1982; 1983; Marslen-Wilson, 1987; Tyler & 
Wessels, 1983), while others have emphasized the 
inhibitOTy aspect (Tanenhaua et ol., 1979; West & 
Stancvich, 1986; Carello et ah, 1988). However, 
the question of whether facilitation or inhibition, 
or both are operative is unsettled because, with 
the exception of one study in which only inUbition 
was found (West & Stanovidi, 1986), the syntactic 
priming effect has not been assessed relative to a 
neutral condition. 

The disrinction between facilitation and 
inhibition is important because each of these two 
processes might reflect a different cognitive 
mechanism. In particular, current models of 
priming suggest that facilitation and inhibition 
differ in their attentional requirements. In normal 
language communication syntactic congruity is 
expected. Therefore, it might be expected that 
syntactically congruent targets are automatically 
integrated into the sentence structure. In 
contrast, syntactically incongruent targets cannot 
be automatically integrated into the syntactic 
context. Therefore, they m&y require some re- 
evaluation of the sensory input as well as of the 
context. In the semantic domain it is assumed 
that these activities which inhibit word 
identification are actively controlled and require 
the allocation of attention resources (Neely, 1977; 
Posner & Snyder, 1975). 

The role of attention in syntactic priming has 
been approached indirectly in earlier studies. For 
example, dealing with inflectional morphology, 
Katz et al. (1987), suggested a modular syntactic 
processor whose involvement in word recognition 
is mandatory and informational encapsulated 
(Fodor, 1983). This interpretation implies that 
s3mtactic priming, particularly as it relates to 
facilitatory processes, should not require 
attentional resources. Indeed, several authors 
have proposed that syntactic priming is automatic 
(Carello et al., 1988; Guijanov et al., 1985; 



Lukatela et al., 1982: See also Seidenberg et al., 
1984)). Note, however, that in the studies just 
cited, the automaticity of the syntactic priming 
effect was suggested primarily by inflectional 
processing in pairs of words presented in the 
highly-inflected Serbo-Croatian language. Testing 
English-speaking subjects with word-pairs 
materials, Goodman et al. (1981) found evidence 
that syntactic priming may be strategy-controlled 
and modulated by attenti<m. A role for attention in 
syntactic priming can be inferred indirectly from 
the assumption that attention is involved 
primarily during lexical (or post-lexical) processes 
wfaidi are involved in lexical decision more than in 
naming. Indeed, several studies using single-word 
context in English (Seidenberg et al., 1984) as well 
as in Serbo-Ooatian (Carello et al., 1988) reported 
that syntactic priming in naming, was 
significantly smaller than in lexical decision or 
inexistent. In addition, iising sentential context 
West and Stanovich (1986) found significant 
inhibition for incongruent targets without 
facilitation of congruent targets. 

The involvement of attention may be especially 
conspicuous in the case of incongruent targets, 
when re-evaluation of the target/sentence 
relationship, although possibly unavoidable, 
necessarily requires attentional resources. Such 
an interpretation of the syntactic priming effect 
was suggested by Tanenhaus et al. (1979). 
Examining the process of selecting the 
CO. /-dually appropriate readings of noim-verb 
ax 'dbiguities in sentences, these authors suggested 
th.it the syntactic selection process is 
characterized by veiled controlled mechanism 
which makes use of context to suppress the 
inappropriate meaning (see Shiffrin & Snyder, 
1977). ^plied to syntactic priming, Tanenhaus et 
al. (1979), suggest that the inhibition of 
incongruent targets is caused by a controlled, yet 
unavoidable (therefore ^veiled**) process of 
matching the incongruent sensory input with the 
expected syntactic structure. 

To summarize, the present evidence for a role of 
attention in syntactic processing is not conclusive. 
Indc^, most authors siiggest that the application 
of syntactic rules is mandatory and does not 
require mnch attention. However, the empirical 
basis for this conclusion is weak. First, attention 
was not directly manipulated in any of those 
studies. Second, the conclusions were based 
mostly on studies of syntactic priming by single 
word context. Finally, the absence of neutral 
conditions in most studies prevents any 
distinction between the facilitatory and inhibitory 
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components of the syntactic priming effect. The 
present study is a systematic investigation of the 
syntactic priming effect in spoken sentences. We 
sought to determine the relative contribution of 
facilitatory and inhibitory mechanisms to 
syntactic priming and to examine the attention 
requirements of each of these mechanisms. 

Methodological Considerations 

In the present study, we manipulated the 
Hebrew agreement rule between siibject and 
predicate regarding gender and number, and a 
morpho-syntactic rule that involves the 
decomposition of the coi^unctive form of pronoun- 
plus-preposition. These rules were chosen for two 
reasons. First, we aimed at isolating the influence 
of the syntactic context irom the influence of the 
semantic context Both agreement between subject 
and predicate, and the morpho-syntactic rule that 
we employed are simple and essential in Hebrew 
grammar. The essential role of an agreement rule 
in Hebrew is to specify the syntactic relation 
between the constituents of a sentence, and has no 
effect on the semantic information. For example: 
The predicate agrees with the subject in person, 
gender and numbet but, because the specification 
of the gender and number is already available in 
the subject, violation of one or more of these types 
of agreement does not affect the meaning of the 
sentence. Moreover, because the agreement rule is 
at the level of inflectional morphology, violation of 
it does not cause changes in word class (changes, 
that may have semantic implications, Carello, 
1988). 

Second, the particular agreement rule that we 
chose operates between sentential elements, like 
the subject and the predicate, and not at the 
phrase level as, for example, the agreement 
between subject and attribute. Therefore, we were 
not constrained to present the subject and the 
predicate in succession, thus emphasizing the 
sentence rather than the phrase level. Because of 
the minimal involvement of semantic factors, and 
the possibility to deal with syntactic rules beyond 
the phrase level, we believed the rules that we 
used, were appropriate for exploring S3mtactic 
priming effect.^ In addition it should be 
emphasized that none of the targets used 
represented a high cloze of the sentence. 
Therefore, subjects could not simply predict the 
target and use semantically-induced word- 
guessing strategy. 

Most of the previous studies of the effect of 
syntactic context (with the exception of Katz et al., 
1987; Marslen-Wilson, 1987 and Tyler & Wessels, 



1983) used visvially presented stimuli. In the 
present study, we have examined syntactic 
priming in speech perception rather than reading 
because speech is more basic than reading in 
human language and is perhaps less affected by 
learned strategies. 

Previous studies of semantic or associative 
priming in the visual modality suggeswd that the 
degradation of stimulus intelligibility magnifies 
the effect of contextual influence on word 
recognition (Becker & Killion, 1977; Meyer, 
Schvaneveldt, & Ruddy, 1975; Neely, 1991; 
Stanovich & West, 1983). Therefore, in attempt to 
focus our investigation on the nature of the 
syntactic context effect, our basic task required 
the identification of target words masked by white 
noise. 

EXPERIMENTl 

The purpose of the present experiment was to 
assess the relative contribution of facilitatory and 
inhibitory processes to syntactic priming. In a 
previous study (Bentin, Deutsch, & Liberman, 
1990) we observed a large syntactic context effect 
on the identification of words masked white noise. 
The identification of target words was four times 
as accurate when they were syntactically 
congruent than when they were incongruent with 
the context sentence. In the present experiment 
we replicated and extended our former study by 
adding a neutral condition. The addition of the 
neutral condition enabled us to disentangle the 
facilitatory effect of syntactic congruity and the 
inhibitoiy effect of syntactic incongruity that were 
confounded in our previous study (see also West & 
Stanovich, 1986, Neely, 1976). 

The neutral context that we used with all 
targets was *^e next word is...," as was originally 
suggested by McClleland and O'Regan (1981) and 
applied to an investigation of syntactic priming in 
reading by West and Stanovich (1986). We chose 
this neutral condition because, it probably 
involves no syntactic bias toward specific syntactic 
structures or word classes (West & Stanovich, 
1986). 

We assumed that the facilitatoiy and inhibitory 
components which may contribute to the syntactic 
priming effect, should be differentially reflected in 
comparison to the neutral condition. Facilitation 
was measured by the difference between the 
percentage of correct target identification in the 
congruent and the neutral context, whereas the 
difference between the correct identification in the 
neutral and the incongruent context conditions 
was the measiure of inhibition. 
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Method 

Subjects. The subjects were 30 undergraduate 
students who participated in the eiq>eriinent for 
course credit or for payment They were all native 
speakers of Hebrew, without any known hearing 

problems. 

Test Materiah. The auditory identification test 
included 44 target words. Targets were the last 
word in a three- or four-word sentence. Each 
target was embedded in three different sentences, 
which defined three different conditions of the 
syntactic conteizt: 1. KJongruenfT*— the target word 
fitted the syntactic structure of the sentence. 2. 
^ncongvnient*~the target word did not fit the 
syntactic structure of the sentence, that is, caused 
a violation of a syntactic rule* 3. A *Neutrar 
condition as eiplained above. 

The syntactic violations were constructed by 
changing the congruent sentences in one of the 
following ways. 

Type 1: Violation of the agreement j. gender 
between subject and predicate. This category 
included 12 target words repeated across the three 
context conditions formmf a total of 36 sentences. 
In the incongruent conaibon a masculine subject 
was presented with a feminine predicate (in 6 of 
the sentences ) or vice-versa (in the other 6 
sentences), that is, a feminine subject presented 
with a masculine predicate. 

Type 2: Violation of the agreement in number 
between subject and predicate. Twelve target 
words (other than in Type 1) were repeated across 
the three conditions forming 36 sentences. In the 
incongruent condition a singular predicate 
followed a subject in a plural form (in 6 of the 
sentences), or vice-versa (in the other 6 
sentences). 

Type 3: Violation of the agreement in both 
gender and number between subject and 
predicate. This category also included 12 target 
words (different than in type 1 and 2) and 
repeated across conditions to form 36 sentences. 
In the incongruent condition the compatibility of 
gender and number between the subject and 
predicate was altered in each sentence. For 
example: A masculine singular subject was 
followed by a feminine plural predicate. (We 
constructed all the 4 possible combinations, 3 
sentences for each). 

Type 4: Decomposition of the coi^unctive form of 
pronoim and preposition. This category included 8 
target pronouns, each of which was combined with 
a different preposition, forming 24 sentences. In 
Hebrew, the pronoun and the preposition are 



always in a coiqunctive form. Thus, in the 
incongruent condition, the conjunctive form was 
decomposed into its two elements. For example: 
The conjunctive form *alecha* C^on you*) was 
presented as two separate words: *ar (the 
preposition •on*) and *ata* (the pronoun *you*). In 
the neutral condition the targets were presented 
as normal ooiuimctions. 

The sentences of ^rpes 1 to 3 were formed of 
three words in the following order: Subject, 
attribute and predicate. The masked target was 
always the predicate. The predicate was either a 
vetfo or an adjective (participle form in nominal 
clauses). Type 4 sentences were formed of a 
subject, a predicate and a verbal completion (the 
conjunctive pronoun). The masked targets were 
the verbal completions in tlieir normal conjunctive 
form (congruent and neutral conditions) or 
decomposed (the incongruent condition). 

The sentences were organized in 3 lists of 60 
sentences, 20 in each congruify condition. Each 
group of 20 included 12 manipulations of the 
agreement rule (Types 1 to 3) and 8 manipulations 
of the morpho-syntactic rule (Type 4). Thd targets 
in sentences of Types 1 to 3 were rotated so that 
each subject saw each target only once but, across 
subjects, each target appeared in each congruity 
condition. Because the number of the pronouns is 
limited, the rotation of pronouns between 
congruity conditions was within suttjects, so that 
each appeared 3 times in a list (once in the 
decomposed form). In order to avoid the repetition 
of priming as much as possible a different context 
was used in each condition. Moreover, the contents 
were counterbalanced across the three lists. 

All the sentences were recorded on tape by a 
female who was a professional speaker of Hdsrew. 
The tapes were digitized at 20 kHz and edited as 
follows. The duration of the nuisk was eq\ial in all 
sentences, determined by the duration of longest 
target The white noise was digitally added to ttxe 
target, starting slightly before onset with a signal- 
to-noise ratio of 1:3.4. This ratio was determined 
on the basis of pilot tests, so that correct target 
identification level was about 50%. 

The sentences in each list were randomized and 
output to tape at a 2 second inter-sentence 
interval at a comfortable loudness. 

Procedure. Subjects were randomly assigned to 
one of the three stimuli lists. Each subject was 
tested individixally. The experimenter and the 
subject listened to the stimuli simultaneously, 
both using earphones (HD-420). 

The subject was instructed to listen to the 
sentence and to repeat the last (masked) word 
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during the silence interval at the end of each 
sentence. No time constraints were imposed; in a 
few instances when the subject's response was 
delayed relative to the inter-sentence interval, the 
experimenter stopped the tape-recorder. The 
responses were recorded manually by the 
experimenter. 

The experimental session began with 12 practice 
trials (4 sentences in each condition), followed by 
the test list 

Results 

Subjects^ responses were initially coded as 
correct (accurate identification of the inflected 
word) or error. The errors made in the 
incongruent condition were further categorized 
into four types: 1) *Self correction* (a correction of 
the syntactic violation using the same root); 2) 
^Random completion* (a totally different root 
forming a semantically and syntactically 
congruent sentence); 3) ''Nonsense* (any 
completion which was semantically meaningless 
or syntactically incongruent, including nonwords); 



4) *No response* (1 don't know*). In the neutral 
and congruent conditions only the last three 
categories were possible. 

Because in our previous study (Bentin et al., 
1990) the congruity efTect on the four types of 
syntactic rules was similar, we collapsed our 
analysis over the sentence types. 

Across subjects or stimuli, the percentages of 
correct identification were 50.2%, and 

27.3% for the congruent, neutral, and incongruent 
syntactic conditions, respectively (Figure 1). 

The statistical significance of the congruity 
effect was examined by one-factor analyses for 
subjects (F2) and stimuli (F2). The main effect of 
syntactic col text was significant [Fl(2,58)=:110.5, 
MSe=153,p<.0001 and F2(2,118)=49.8, MSe=:661, 
p<.000U 

The distribution of errors is presented in Table 
1. Statistical evaluation of the distributions 
(ANOVA followed by Tukey-A post-hoc 
comparison) showed that within each congruity 
condition, all differences were reliable at the p<.05 
level. 
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Table 1. Mem percentage of errors (SEm) of each type in each congruity condition. 









ERROR TYPE 




coNGRumr 

CONDmON 


SELF 
(X)RRECnON 


RANDCW 
COMPLElTC»^ 


NONSENSE 


NO 
RESPONSE 


CCWGRUENT 

NEUTOAL 

INCX»^GRUENT 


12,1% (U) 


62.0% (4.1) 
54.6% (4j0) 
193% (2.1) 


Z9% (1.9) 
0.5% OS) 

33% (1.2) 


33.4% (43) 
42.8% (3.9) 
62.2% (3.2) 



Discussion 

The results of Experiment 1 demonstrated that 
the syntactic priming effect, as it is revealed in 
our auditory word identification paradigm, 
consists of two components, facilitation and 
inhibition. The relative contribution of each 
component to the global context effect is 
approximately equal: Congruent context improved 
identification of white*noise masked words by 
about 23% while incongruous context reduced 
identification by the same amount, from a neutral 
baseline of about 50% correct 

Before discussing these results any further, a 
trivial interpretation should be considered. 
Because only verbatim accurate ':i''^8ponse8 were 
considered correct, it could have been the case 
that the pattern of facilitation and inhibition sim- 
ply reflected that, facing uncertainty, subjects 
identified the word-root and completed the inflec- 
tion using an intelligent-guessing strategy. Along 
with this interpretation, the difference in the per- 
centage of correct identification of inflected tar- 
gets in the congruent and incongruent conditions, 
would reflect the correspondence or disagreement 
between the subject's intuition about how the 
identified root should have been inflected and 
what was actually presented. Such a strategy, 
however, implies that a) targets' roots were identi- 
fied better than their infled;ed forms and b) that 
in the incongruent condition there would have 
been be a high percentage of Type 1 errors (i.e., er- 
rors reflecting tho inadequate use of the correct 
syntactic form). The first implication could not 
hold in the present experiment because, as men- 
tioned in the methodological considerations, there 
was no strong semantic constrain which could 
have facilitated an independent identification of 
roots on semantic basis. The second was rejected 
by the analysis of errors. 

As revealed in Table 1, the percentage of self 
correction in the incongruent condition was very 
small, by far smaller than the percentage of no 



responses. Note also that the percentage of 
random completions (i.e., substituting the target 
with an incorrect but semantically and 
syntactically congruent word) was also relatively 
low in this condition. This pattern does not 
support the ^telligent-guessing strategy* while 
suggesting that the low percentage of correct 
identification in this condition reflected a general 
process of inhibition caused by syntactic 
incongruence. 

Additional support to our interpretation is 
provided by comparing the pattern of random 
completions and no responses in the incongraent 
condition with those observed in the neutral and 
congruent conditions. It is evident that the 
tendency to substitute a different but logical word 
for the misidentified target (random completions) 
is higher in the neutral than in the incongruent 
condition and even higher in the congruent 
condition. On the other hand, tlie tendency to say 
•I don't know* (no response) is lower in the 
congruent and neutral conditions than in the 
incongruent condition. This tendency can be 
explained assuming that syntactic incongruence 
inhibited identification and enhanced uncertainty. 
The absence of syntactic incongruence in the 
congruent condition eliminated inhibition, and 
reduced uncertainty even when targets were 
misidentified. As a result, the percentage of 
random completions in the congruent condition 
was twice as large as the percentage of no 
responses. 

The present results diverge from those reported 
by West and Stanovich (1986) who, using a similar 
neutral condition, found only inhibition. However, 
in addition to differences in task (West and 
Stanovidi used a visual lexical decision task), the 
two studies differ in several other meaningful 
ways and, therefore, cannot be strai^tforwardly 
compared. First, we presented auditory masked- 
words whereas West and Stanovich (1986) used 
visually presented unobstructed stimuli. Although 
we have no evidence for a differential effect of 
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context in speech perception and reading, we 
cannot ignore this possibility. Moreover empirical 
findings on associative and semantic priming in 
readings sut^est that context effects are larger for 
degraded than for undegraded words (Stanovich & 
West, 1983). It is also possible that the divergence 
between the two studies is partly accounted for by 
differences between the material used in the two 
studies. In contrast to the semantically anomalous 
sentences used by West & Stanovich (1986), oxir 
sentences were always semantically sound. 

Since we have no direct evidence about the in- 
fluence of the above* mentioned factors on context 
effects and how they interact with syntactic prim- 
ing, our ability to draw general conclusions is 
limited. Therefore inferences regarding the exis- 
tence of facilitatory and inhibitory components to 
syntactic priming, and especially the finding of the 
equal contribution of the two components, may be 
restricted to the specific condition of the present 
demonstration. Despite this limitation, we can 
continue our general course and investigate the 
involvement of attention mechanisms with each of 
these two components. 

EXPERIMENT 2 

In the present experiment we examined the 
influence of presenting congruent and incongruent 
sentences in separate or mixed blocks on the 
inhibitory and facilitatory components of the 
syntactic priming effect. 

Studies of semantic priming in visual word 
perception generally showed that lowering the 
proportion of related targets in the list reduced 
the amount of inhibition (Fischler & Bloom, 1979; 
Stanovich & West, 1981; but see Stanovich & 
West, 1983, Experiment 4). Within the framework 
of the two-process theory of Posner & Snyder 
(1975), most authors have assumed that the 
influence of the ratio between related and 
unrelated targets is mediated by attention 
mechanisms (e.g., Fischler & Bloom, 1985; 
Stanovich & West, 1983; Tweedy, Lapinski, & 
Schvaneveldt, 1977). Specifically it has been 
assumed that lowering the proportion of related 
targets discourages word perception strategies 
based on context-related expectations. 

A similar manipulation was used to compare 
semantic vs. syntactic priming effects in visual 
word perception (Goodman et al., 1981 and 
Seidenberg et al., 1984). These studies suggested 
that the syntactic priming effect is mediated 
primarily by post-lexical, strategic mechanisms. 
In these studies, however, no attempt was made to 



examine the effect of separately manipulating 
strategies design to operate selectively on the 
facilitatory and inhibitory components of the 
syntactic priming effect We applied the blocked 
vs. mixed presentation technique to disentangle 
the effect of attention mechanisms on each of 
these two compcments. 

The blocked condition is an extreme case of 
manipulating the ratio between incongruent and 
congruent sentences, where the proportion of 
incongruent stimuli is either 1:0 or 0:1. This 
proportion was contrasted with a 1:1 ratio of 
incongruent and congruent stimuli used in the 
mixed condition. Therefore, the comparison 
between the blocked and mixed modes of 
presentation should maximize the effect of 
attentional processes that may mediate syntactic 
priming. A differential effect of the presentation 
mode on the percentage of correctly identified 
words in congruent and incongruent sentences 
should suggest that attention mechanisms are 
differentially involved in the mediation of the 
facilitatory and inhibitory components of the 
syntactic priming effect. Particularly, the 
involvement of attention mechanisms should 
reduce interference in the blocked presentation, 
leading to a higher percentage of identification of 
incongruent targets. On the other hand, the 
absence of an interaction between the 'modes of 
presentation and the congruity of the sentence 
should indicate that attention mediates the two 
components to a simili^r extent 

Method 

Subjects. The subjects were 60 undergraduate 
students who did not take part in the first 
experiment. They participated in this experiment 
for course credit or for payment. They were all 
native speakers of Hebrew, without any known 
hearing problems. 

Test Materials. The sentences were those used 
in Experiment 1, with the exception of the neutral 
stimuli. Thus each stimuli list included 40 
sentences, 20 congruent and 20 incongruent. In 
the "^xed*^ presentation the 40 sentences were 
randomized and presented in one block. In the 
^locked*^ pres^tation congruent and incongruent 
sentences were clustered separately in two blocks 
of 20 sentences each. The sentences in each of the 
two blocks were randomized. 

A target appeared only once in each list (with 
the exception of sentences of Type 4, see above). 
Across lists, each target appeared equally in the 
congruent and incongruent conditions. 
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Procedure. Different 30 subjects were tested 
with each presentation mode. Subjects were 
randomly assigned to one of the lists, so that each 
subject was exposed equally to syntactically 
congruous and incongruous senten'ses. 

The mixed presentation followed the same 
experimental procedures as in Experiment 1. The 
experimental list was preceded by a mixed list of 
12 practice sentences (6 congruent and 6 
incongruent). 

In the blocked presentation, 15 subjects began 
with the congruent block, and 15 with the 
incongruent blodc* Each blodc was preceded by 8 
practice sentences in the respective congruity 
cradition. No special instruction were given before 
the incongruous block, but the ^'peculiar* structure 
of the sentences was not denied in reply to queries 
raised by the subjects following practice with 
incongruous sentences (as was true for the mixed 
condition as well). 

Results 

The percentage of correct identification of 
targets was averaged for each subject and target 
in each congruity condition. Separate means were 
computed for each presentation group. The 



percentages of correct identification of 
83mtactically congruent targets was almost 
identical in the blocked and the mixed 
presentation groups. In contrast, more 
incongruent targets were identified in blocked 
than in mixed presentation (Picture 2). 

The statistical significance of the observed dif- 
ferences was tested by two*factor analyses for sub- 
jects (Fl) and for stimuli (F2). The factors were 
Congruity condition (congruent, incongruent) and 
Mode of presentation (mixed, blocked). Both main 
effects were significant [Fl(l,58)s486.7, MSesl23, 
p<.0001, F2(l,69)=128.7, MSe=937.3,p<.0001, and 
Fl(l,58)= 18.1, Mse=192,p<.0(»l, F2(l,59)=21.6, 
Mses296, p<.0001, for the Congruity and Mode of 
presentation effects, respectively]. The most inter- 
esting result, however, was the significant inter- 
action between the two factors, revealing that the 
presenting incongruent and incongruent sentences 
in separate blocks improved the identification of 
incongruent targets, but had no effect on congru- 
ent targets tFl(l,58)=:25.6, MSe=123 p<.0001, 
F2(l,59)=21.9, MSe=256, p<.0001]. 

Errors in Experiment 2 were categorized and 
analyzed using the same types as elaborated in 
Experiment 1 (Table 2). 
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Figure 2. Tht percentage of comctly identifitd congnicnl md incongruent tsfgtts in the mixtd and blocked 
prtsentadon conditioni. 
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Table 2. hfean percentage of errors (SEm) of each type in each congruity condition in the mixed and blocked 
presentation modes. 



ERROR TYPE 



coNGRurry 



SELF 



RANDOM 
CQMPUmON 



NONSENSE 



NO 
RESPONSE 



CONGRUENT 


MIXED 
BLOCKED 






63.7% 
56.6% 


(5.7) 
(43) 


0.4% 
Z9% 


(0.4) 
(1.4) 


33.9% 
(5.2) 
37.1% 
(4.4) 


INCONGRUENT 


MIXED 
BLOCKED 


173% 
13.9% 


(1.8) 
(L8) 


22.2% 
30.2% 


(23) 
(3.6) 


4.0% 

11.8% 
(2.1) 


(1.2) 


56.4% 
(33) 

43.4% 
(4.1) 



In the congruent condition the distribution of 
errors was similar for mixed and blocked 
presentation modes (the interaction was not 
significant F(2,116)<1.0). Errors were unevenly 
distributed among types [F(2,116)=::70.9 MSe=t729, 
p<.0001]. The pattern of this distribution was 
similar to Experiment 1: There were significantly 
more random completion than no response errors 
(p<.01). In the incongnient condition, on the other 
hand, there was a significant interaction between 
the distribution of errors among the types and the 
mode of presentation [F(3,174)=:5.2, MSe=1532, 
p<.01]. Post hoc analysis (Tukey-A) revealed that, 
although significantly less correction than no 
response errors were made in both presentation 
modes (p<.01), the difference was larger in the 
mixed than in the blocked presentation. 

Discussion 

The present results revealed that manipulating 
the proportion of congruent and incongruent 
sentences in the experimental list affects only the 
inhibitory component of the syntactic priming 
effect. In comparison to a mixed presentation (1:1 
proportion), the presentation of incongruent and 
congruent sentences in separate blocks reduced 
the amount of inhibition without altering the 
amount of facilitation. Assuming that this 
manipulation influences primarily strategic 
components, the present results suggest that 
syntactic priming includes attention-mediated 
mechanisms that are reflected more in its 
inhibitory than its fadlitatory effects. 



An attention mechanism that might have been 
affected by our manipulation is the strategic 
process of generating context-based expectations 
about the target's syntactic form. The application 
of this strategy should probably be encouraged by 
a high proportion of congruent sentences in a 
mixed list and discouraged by frequent syntactic 
incongruence. Hence, the tendency to generate 
expectations (leading to less identification of 
incongruent targets) should decrease in parallel to 
the reduction of the percentage of congruent 
sentences in the list. Informal study of the 
percentage of correctly identified incongruent 
targets across Experiments 1 and 2, conformed to 
this prediction: Incongruent targets were 
identified least (14.3%) in the mixed condition of 
Experiment 2 where 50% of the sentences were 
congruent, more in Experiment 1 (27.3%), where, 
due to the neutral condition only 33% of the 
sentences were congruent, and most in the 
blocked condition of Experiment 2 (35.3), where 
there were no congruent sentences. In contrast, 
the proportion of congruent sentences did not 
affect the percentage of correctly identified 
congruous words significantly (69.3%, 74.8%, 
69.8% in the mixed presentation of Experiment 2, 
Experiment 1, and the congruent block of 
Experiment 2, respectively) This suggests that the 
fadlitatory component of the syntactic priming 
effect is less sensitive to strategic mediated 
processes. 

Additional support to our interpretation is 
provided by the distribution of errors among the 
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different types. A comparison between the mixed 
and blocked presentation modes revealed that the 
percentage of random and nonsense errors (those 
that reflected less concern about the priming 
sentence) was higher in the blocked than in the 
mixed presentation modes, whereas the opposite 
trend was observed for no response and self 
correction errors (that reflect the influence of the 
priming effect induced by the syntactic structure 
of the sentence). Hence, it appears the syntactic 
context influence on word identification was 
reduced in the blocked relatively to the mixed 
presentation mode. The singularity of this 
interaction to the incongruent condition is in 
agreement to our hypothesis that the generation 
of expectations is one of the factors involved in 
producing the syntactic priming effect on word 
identification. 

It is worth noting that the present results 
diverge from the results of Stanovich & West 
(1983), who found that the pattern of contextual 
(semantic) effects was not altered by increasing 
the proportion of congruent targets. This 
divergence may either suggest a fundamental 
difference between the involvement of attention in 
semantic and syntactic context effects, or that our 
manipulation of blocking congruity condition was 
more powerful than changing proportion of 
congruent and incongruent targets within a mixed 
block. 

In Experiment 3 we used a different method to 
manipulate the subjects* tendency to generate 
expectations as a strategy of word identification, 
in an attempt to corroborate the differential 
involvement of attention with the fadlitatory and 
inhibitory components of the syntactic priming 
effect 

EXPERIMENTS 

In contrast to Experiment 2, where our 
manipulation was meant to discourage the 
generation of expectations for specific syntactic 
forms, in the present experiment we sought to 
encourage this strategy. 

Studies of semantic priming revealed that the 
length of the inter-stimulus interval (ISI) [or the 
stimulus onset asynchrony (SOA)] between the 
context and the target, influences the relative 
weight of the attention-based component of the 
priming effect with single-word (Antes, 1979; 
Neely, 1977) and sentence contexts (Stanovich & 
West, 1979). Different ISIs were used in different 
studies and the general consensus among authors 
is that, vritbin a limited range of times, the 
tendency to use context-based expectations 



increases with longer ISIs. Possibly, at longer ISIs 
the subject has more time to process the context 
and generate sudi expectations. 

The influence of the ISI between context and 
target on syntactic context effects is not as clear. 
For example, using a lexical decision task with 
printed Serbo-Croatian stimulus-pairs, Lukatela 
et al.<» (1982) found significantly larger syntactic 
priming effects when the SOA was 800 ms than 
when it was 300 ms. However, with auditory 
presented stimuli (in Serbo-Croatian), Katz et al. 
(1987) did not find a reliable interaction between 
the length of the ISI (0 vs. 800 ms) and the 
magnitude of the S3mtactic priming on lexical 
decision. Despite the apparent divergent results, 
both groups of authors suggested that the 
syntactic context effect reflects the operation of an 
autonomous automatic module rather than an 
attention mediated mechanism. However, as Katz 
et al. (1987) pointed out, it is possible that this 
conclusion holds only for the particular case of 
inflectional morphology characteristic to Serbo- 
Croatian. Indeed, indirect evidence for non- 
automatic aspects of syntactic priming has been 
found in English (Tanenhaus, et al., 1979). Using 
a naming task, these authors reported that at 0 
ms SOA, subjects were insensitive to the specific 
syntactic (and semantic) form of the prime, 
whereas at 200 ms, the targets were facilitated 
only by appropriate forms. Concluding these 
results Tanenhaus et al. (1979) suggested that at 
longer SOAs, syntactically inappropriate forms 
are inhibited by veiled controlled process (i.e., 
Shiffrin & Schneider, 1977). The time course of 
the controlled process, however, was obscured by 
the finding that at 600 ms SOA, it's effect was not 
as evident as at 200 ms SOA. Together, the 
previous studies cannot unequivocally support or 
reject the existence of attention -mediated 
components of the syntactic priming effect. An 
additional step towards the clarification of the role 
of attention in syntactic priming can. be made by 
distinguishing between effect of ISI manipulation 
on the inhibitory and facilitatory components of 
syntactic priming. 

In the present experiment we used two ISIs. 
One was set at the normal speech rate, and the 
other was 350 ms.^ On the basis of the results of 
Experiment 2, we anticipated that the ISI ma- 
nipulation should affect primarily the inhibitory 
component. More specifically we predicted that 
the at the longer ISI, syntactic incongruity should 
have a more deleterious effect on the identification 
of targets than at normal speech rate whereas the 
facilitatory effect will not change. 
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Method 

Subjects. Sixty subjects participated in this 
experiment. Thirty were the mixed presentation 
group from Experiment 2. Hie other SO were naive 
xmdergraduates who did not take part in the 
previous experiments, and participated in this 
experiment for course credit or for payment. All 
the subjects were native speakers of Hebrew, 
without any known hearing problems. 

Stimuli arid Deagn. The stimuli lists were those 
used in the mixed presentation condition of 
Experiment 2. The only alteration was the 
introduction of a silence period of 350 ms between 
the offset of the last unmasked word in the 
context and the onset of the masked target* The 
30 naive subjects were tested with these lists. 
Their performance was compared to the 
performance of the mixed presentation group in 
Experiment 2, who heard the same lists at a 
normal speech rate. Each subject was exposed 
equally to syntactically congruous and 
incongruous sentences. Thus, the subject analysis 
was a mixed model ANOVA. The ISI effect was 
tested between groups and the syntactic congruity 
effect within subjects. 

Across subjects, each target appeared equally in 
the congruent or incongruent conditions, and at 
each ISI. Thus the stimulus analysis was 
completely within stimulus. 

Procedure, The experimental procedure of the 
present experiment in which we tested only the 
longer ISI condition was the same as that followed 
in the mixed presentation condition of Experiment 
2. The test list was preceded by 12 practice 
sentences that included the silence interval. 



Except of being informed about the brief silence 
period preceding the masked target word, the 
subjects were instructed identically as in the 
mixed presentation condition of Experiment 2. 

Results 

The percentage of correct identiHcation of 
targets was averaged for each subject and each 
target in each congruity condition. These results 
were compared to the percentage of correct 
identification of congruent and incongruent 
targets at normal speech rate in the mixed 
presentation condition of Experiment 2 (Figure 3). 
Congruent targets were identified almost 
identically in the two ISI conditions. In contrast, 
the percentage of incongruent targets 
identification was smaller in the 350 ms ISI 
condition than at normal speech rate. 

The statistical significance of the observed 
differences was tested by two-factor analyses 
(mixed model for subjects (Fl) and repeated 
measures for stimuli (F2)). Both the congruity and 
ISI main effects ^ere reliable [Fl(l,58)-848.1, 
Mse=123, p<.0001, F2(l,59)=232.7, Mse=880, 
p<.0001, for the congruity effect and Fl(l,58)= 5.2, 
Mse=159. P<.0264, F2(l,59)=7.412, Mse=268, 
p<.0085 for the ISI effect]. The most important 
result, however, was the reliable interaction 
between the two factors, revealing that the 350 ms 
silence interval reduced the identification of 
incongruent targets, but had no effect on 
congruent targets [Fl(l,58):=4.1, Mse:=123 
p<.0488, F2(l,59)=4.4, Mse=211, p<.0411]. 

The distribution of errors in the different ISI 
conditions is presented in Table 3. 
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Table 3. Mean percentage of errors (SEm) of each type in each congruity condition with normal speech rate and with 
350 ms ISI between context and target 



ERROR TYPE 

CX>NGRUnY SELF RANDCW^ NO 
CONPrnON CX)RRECnON COMPLETION NONSENSE RESPONSE 



CONGRUENT 


N^^A4AL 






63.7% 


(5.7) 


0.4% 


(0.4) 


33.9% 


(5^) 




350 ms 






53.1% 


(5.2) 


l.S% 


(1^) 


45.1% 


(4.8) 


INCONGRUENT 


NC»U4AL 


173% 


(1.8) 


22.2% 


(23) 


4.0% 


(1.2) 


56.4% 


(33) 




350 ms 


15.2% 


(13) 


17.9% 


(2.5) 


1.9% 


(0.9) 


65.4% 


(3^) 



The ISI manipulation influenced the distribu- 
tion of errors in the incongruent condition 
[F(3,174)=2.72, MSe=209, p<.05j, but not in the 
congruent condition [F(2,116)=2.15 MSe^SSS, 
p>.12]. Across conditions the distribution of errors 
was similar to that observed in the former two ex- 
periments and significant [F(3,174)=179.3» 
MSe=209,p<0.0001 and F(2,116)=61.2, MSe=833, 
p<.0001» in the incongruent and congruent condi* 
tions^ respectively]. Post hoc analysis (Tukey-A) of 
the interaction revealed that, while no response 
type errors were more abundant in the 350 ms ISI 
condition than with normal speech rate, the per- 
centage of all other three error types was reduced 
in the latter than in the former condition. 

Discussion 

Increasing the ISI from a normal speedi rate to 
350 ms between the context phrases and the 
targets, reduced the percentage of correct 
identification of incongruent targets but had no 
influence on the identification of congruent 
targets. These results confirmed our previous 
observations that the facilitatory and inhibitory 
components of the syntactic priming effect are 
differentially sensitive to the manipulation of 
attention-based strategies of word identification. 

In Experiments 3, as well as in Experiment 2, 
our manipulation affected only the inhibitory 
priming component albeit, in each experiment in 
an opposite direction. Therefore, these results 
suggest that in both experiments we manipulated 
the same attention-mediated priming process. 
Assuming that this process involves the 
generation of context-based expectations, the 
results of both experiments support our 
distinction between an inhibitory component of 
syntactic priming, which reflects an attentional 
process of generating expectations, and a 



facilitatory component, that is less reliant on 
attentional mediation. 

The distribution of errors is in complete 
agreement with the above interpretation. Again, 
the ISI manipulation influenced the distribution 
of errors only in the incongruent condition. 
However, the trend of this interaction was 
opposite to that found in Experiment 2. Whereas 
discouraging the generation of context-based 
expectations in the blocked, relatively to the 
mixed presentation mode increased the 
percentage of random and nonsense error-types, 
encouraging such a strategy by introducing a 
longer ISI lead to a decrease of such errors while 
increasing the percentage of no response errors. 

Despite the correspondence between the results 
of the two experiments and the coherence of the 
emerging picture, the ISI manipulation should be 
considered with caution. Previous studies of the 
time course of sentence-context effects on word 
perception are not conclusive. For example, 
Fischler and Bloom (1979; 1980) presented 
written sentences word by word, manipulating the 
presentation rate. Contrary to our results, they 
found almost no facilitation of lexical decision for 
expected target words while the inhibition of 
incongruous targets was evident at all 
presentation rates. Their conclusion was that the 
effect of the sentence semantic context on word 
recognition is limited to an inhibitory postlexical 
process. This inhibition is probably related to the 
sentences' semantic incongruity and is not 
sensitive to the manipulation of ISI. A closer look 
at their data, however, reveals that, in agreement 
to our results, the magnitude of the inhibition 
effect on lexical decision (speed and accuracy), was 
twice as large at the slower rates (4 and 12 words 
per second), than at the higher presentation rates 
(20 and 28 words per second)* 
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One problem in analyzing the ISI effect is that 
different studies manipulated different time 
intervals. It is possible that the ISI influence is 
not monotonic» and that it differs with factors 
such as task» presentation modality, and the 
linguistic context that it is investigated. It is» 
therefore, possible that relatively small 
differences in the particular ISIs compared in 
different studies, account for the variation of the 
results. The results of two pilot experiments that 
preceded the present study support this 
possibility. In these pilot experiments we explored 
the effect of 500 and 1000 ms ISI compared to 
normal speech rate. The effect of syntactic 
priming at these ISIs were not reliably different 
than at normal speech rate. An interesting trend 
emerged however across the ISIs. At 1000 ms, the 
increase in the inhibition was accompanied by a 
decrease in the facilitation. Relative to 1000 ms, 
500 ms ISI caused a smaller decrease on the 
magnitude of the facilitation and an even bigger 
increase in the inhibition effect. Finally as 
reported in the present experiment, 350 ms ISI 
had no effect on the magnitude of the fadlitatory 
component while significantly increasing the 
magnitude of the inhibition. Thus, it appears that 
the interaction between the ISI and the syntactic 
context effect is limited to a specific range. This 
limit might also account for the absence of a 
difference between the syntactic congruity effect 
at 0 and at 800 ms ISI in Katz et al., (1987) study. 
Despite the caution, however, the present results 
suggest that ISI manipulation^ when carefully 
applied, may reveal interesting aspects of the 
context effects. 

The inherent problems of ISI manipulation are 
not essential, however, to our conclusions 
regarding the involvement of attention in 
mediating syntactic priming effects. Therefore, we 
may resume our discussion of the relation between 
attention mechanisms and the syntactic context 
effects as revealed in the present study. 

GENERAL DISCUSSION 
In the present study we examined the inhibitory 
and the facilitatory aspects of syntactic priming as 
it is reflected in the identification of auditory 
masked targets that were presented as last words 
in clearly displayed sentences. In Experiment 1, 
we found evidence for both components. In 
addition, the data indicated that, at least for the 
present experimental conditions, facilitation and 
inhibition contribute equally to the syntactic 
priming effect. In Experiments 2 and 3 we found 
that manipulation of attention*related factors 



affected the magnitude of the inhibition but had 
no effect on facilitation. The presentation of 
congruent and incongruent sentences in separate 
blocks attenuated inhibition relative to a mixed 
condition. On the other hand, Experiment 3 
suggests that the insertion of 350 ms of silence 
between the target and the context amplified the 
inhibition relative to normal speech rate. 

Across experiments, the scarcity of self- 
corrections and the abundance of no*response 
errors relatively to random completions in the 
incongruent condition^ on one hand, and the 
increased percentage of random completion errors 
at the expense of no^response errors in the 
congruent condition on the other hand, discarded 
the possibility that the variation in the percentage 
of correct identification between the different 
congruency conditions simply reflected a strategy 
of intelligent guessing on the basis of partially 
identified information. Taken together our results 
indicate that the syntactic context effects observed 
in the present study were probably related to a 
post-lexical syntactic analysis of the input, whose 
possible nature is discussed bellow. 

In accord with the commonly-held account for 
attention-mediated factors in semantic priming 
(Fischler, 1977; Fischler & Bloom, 1979; Neely, 
1977; Stanovich & West, 1981, 1983), we suggest 
that our manipulations influenced an attention- 
based mechanism that mediates the generation of 
expectations. However, the concept of generating 
expectations in the syntactic domain can not 
simply be an extension of the models suggested to 
account for attention mediation in semantic 
priming. 

When analyzing discourse the subject naturally 
expects that the input will be coherent with 
his/her existent linguistic knowledge (deGroot, 
Thomassen & Hudson, 1982; Fischler & Bloom, 
1980). We assume that this strategy is applied in 
the syntactic as well as in the semantic domain. 
Specifically, we assume that when a particular 
syntactic structure is alluded by the context, the 
perceiver expects grammatical forms that can be 
integrated into that structure. The observed 
inhibition of incongruent targets in the present 
study might have been caused by the violation of 
those expectations. Possibly, incompatible input 
induces a second pass analysis of the target and/or 
context. This additional process may delay or, 
when the target is degraded, may suppress its 
identification. 

In contrast to previous studies of syntactic 
priming (e.g.. West & Stanovich, 1986), we found 
that the identification of congruent targets was 
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facilitated relative to a neutral condition* 
Rei^ardless of the particular reasons for this 
discrepancy (that have been discussed in 
Experiment 1\ we may speculate on possible 
sources of this facilitation. In the semantic 
priming domain facilitation is assumed to reflect 
two different processes: One is an automatic 
spreading of activation among nodes related in the 
semantic netwoxk (Collins & Loftus, 1975). The 
second is the confirmation of an explicit prediction 
regarding target identity (Becker, 1980). Because 
the existence of a syntactical organized network is 
supported neither by empirical evidence nor by 
theoretical considerations, the mechanism of 
spreading activation is an improbable source of 
facilitation in syntactic priming. Therefore the 
facilitation of syntactically congruent targets is 
better explained by a process that relates on ad 
hoc generated syntactic structures. The 
mechanism of explicit prediction suggested by 
Becker (1980) cannot be directly applied to 
syntactic priming because, in our study, the 
identity of the target could not be predicted by the 
context (for similar claims see also Oden & Spira, 
1983; Tanenhaus et al., 1979; Tyler & Wessels, 
1983). Therefore, we are lead to believe that the 
mechanism of facilitation by syntactic priming is 
based on the same form of expectations as 
postulated to account for the inhibition of 
incongruent targets. Thus, we propose that the 
same expectations may be used by different 
mechanisms to exert both facilitation and 
inhibition on the identification of the target. The 
first, which was postulated above, causes the 
inhibition of incongruent targets. The second, may 
facilitate the identification of congruent targets 
either because expected structures may assist the 
integration of the sentence, or because they reduce 
the amount of sensory input needed for the 
identification of a word.^ 

The above proposal that, in syntactic priming, 
both the facilitation and the inhibition are based 
on a similar process of generating expectations, 
apparently contradicts with the results of 
Experiments 2 and 3 that showed that 
manipulating the tendency to generate 
expectations affects only the inhibition. This 
contradiction can be resolved assuming that the 
generation of expectations at the sentence level is 
motivated by the natural assumption of syntactic 
coherence (similar claims related to the processing 
of sentences at the semantic level were made 
deGroot (1982) and by Fischler & Bloom, 1980). It 
is conceivable that the tendency to generate 
expectations is not under strategic control. This 



view is compatible with the residual inhibition 
observed in the incongruent block, which suggests 
that despite the dear incongruent structure of all 
sentences, the initial expectations could not be 
completely avoided. Hence, at the sentence level, 
the expectations are probably generated by a 
veiled controlled pjfocess which uses only minimal 
attention resources (Schneider & Shiffrin, 1977). 
Sudi a process probably standi* at the basis of the 
fadlitatory mechanism of syntactic priming. On 
the other hand, as discussed above, when the 
same expectations are violated by incoherent 
input, attention is mobilized to trigger and control 
the additional, post-lexical process of re- 
evaluation, whidi we suggest that it is the main 
mechanism of the inhibition. Consequently, 
strategic dianges should influence the magnitude 
of the inhibition, but have only minimal effect on 
the fadlitation. Indeed^ the interaction between 
the distribution of errors and the presentation 
procedure found in Experiments 2 and 3 only in 
the incongruent condition supports this view. 
Attenuating the tendency for re-evaluation of 
context-based expectations (in Experiment 2) 
reduced no-response and self-correction errors and 
increased the percentage of random and nonsense 
responses. On the other hand, facilitating the 
generation of context-based expectations (in 
Experiment 3) increased subjects' uncertainty as 
manifested by the increase in the no-response type 
errors. Should this process influence lexical access 
rather than post-lexical re-evaluation, the 
opposite manipulations on subjects' strategies in 
Experiments 2 and 3 should have had an effect on 
the overall percentage of correct identification in 
the incongruent condition but not on the 
distribution of errors. Our hypothesis that a 
similar attention mechanism is the basis of both 
the facilitation and the inhibition of performance 
in the syntactic priming task, also implies that the 
allocation of attention, at least in language 
processing is not an all-or-none phenomenon. 
Rather, based on data-driven or pre-determined 
strategies different amounts of attentional 
resources are directed to the different aspects of 
language perception processes. 

A caveat of the above discussion is that the task 
used in the present study required the 
identification of degraded stimuli. Therefore is 
possible that the magnitude of the syntactic 
priming was largely dependent upon these 
conditions. In particular, the inhibition might 
have been much smaller if the auditory input was 
clear. The need for the re-evaluation could have 
been less conspicuous in the absence of auditory 
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uncertainty. However, we believe that using 
degraded stimuli we were able to tap mechanisms 
of top-down processing of gjmtax that are 
available to the language speaker. 
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FOOTNOTES 

^Cognition, in press. 

^ Department of Psychology and School of Education, The Hebrew 
University, Jerusalem. 

^Take, for example the senteiKe ^A nice boy eats" which 
translated into Hd>rew would sound ^ded yaf^ ochel**. The 
morphological unit "yeled" (boy) contains information about 
gender (masculine) and number (singular). The same root %vith 
different affixes is used to form the word Taldah* (giri) or 
change the number. The agreement rule requires that the 
attributes and predicate will agree with the subject in gender 
and znimbcr: "yafeh" (nice) is a singular masculine form as is 
"ochel* (eats). The sentence Taldah yafah ochel* is a poasible 
syntactic violation of that sentence because the predicate is in 
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mascuiine fonn while both the subject and attribute are in 
ficnuninefiDm* 
^THs partioilar BI waa dioacn <» tiM baas of pilot studk^ 
preacnt study we were concerned to dcoionatnte tiiie ISI eficct 
cn two comporants of syntactic priming cfiact and not feo 
examine its precise time course of the putative controlled 
component. Therefore we examined different ISb (1000 ms. 



500 ms, and 350 ms), but completdy analyzed only the later that 
had the most conspicuous effect 

similar model was proposed within the frame of the cohort 
tfieory (Marslen-Wilson, 1980). According to this model the 
syntactic context may facilitate wrord identification by limitmg 
the size of an initial cohort to those members which belong to a 
sin^e form^laas category (Tyler ^ Wessds, 1983). 
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Starting on the Right Foot* 
A review of Maril)m Jager Adams' Beginning to Read: 
Thinking and Learning about Prinf^"^ 

Donald Shankweilert 



Marilyn Jager Adams has performed a valuable 
service to all wb 'sh to improve how reading is 
taught. Her he ^ ^sents a comprehensive and 
scientifically le:., Jble treatment of problems of 
immense social importance — problems that partly 
because their very complexity are too often 
treateo .«^/alierly. Thia book is required reading 
for professionals engaged in research on design 
and essment of programs of reading instruction 
and research on diagnosis and treatment of 
reading disability. It is also a valuable resource 
for a wider readership in psychology, cognitive 
science and education. Indeed » anyone who needs 
a clear-headed synthesis of relevant research 
findings bearing on the problems of learning and 
teaching to read can profit greatly from this book. 
With unusual thoroughness, Adams has reviewed 
the mass vf research literature that bears on the 
debate between advocates and adversaries of the 
code emphasis in reading instruction. The tone is 
always constructive. She avoids the rancor that so 
often accompanies discussion of these issues. 
Though even-handed in her treatment, Adams 
does not wrap herself in the cloak of the eclectic; 
after sifting the evidence, she draws strong 
conclusions and states them boldly. 

This book originated with a mandate from the 
United States Congress for a new appraisal of the 
place of phonics in teaching children to read. 
Immdated with complaints about the performance 
of the schools in imparting literacy, and confused 
by the welter of conflicting voices from the 
experts, Congress enacted legislation that led 
ultimately to the U.S. Department of Education's 
commission of this report. Responsibility for 
producing the report was placed in the hands of 
the Center for the Study of Reading, University of 
Illinois at Urbana-Champaign. Adams, a cognitive 



and developmental psychologist at the Center's 
branch at Bolt, Beranek and Newman in 
Cambridge, Massachusetts, was chosen for the 
task. 

Given Adams' extensive background in 
investigation of basic reading processes, she was a 
logical choice and the choice proves to have been 
an excellent one. Charged with the responsibility 
for presenting a thoroughgoing clarification of the 
issues that divide the two sides in what Jeanne 
Chall has called ^e great debate,* Adams was 
given a free hand to shape the report. A panel 
consisting of well-known reading experts from 
around the nation was assembled to offer advice 
and criticism of interim drafts, but the book was 
written by Adams, not the committee. And to her 
great credit, the book is highly readable. It has 
none of the dryness one often finds in a technical 
report. The book displays a graceful and informal 
writing style and betokens an uncommon ability 
to use the language well. 

As Adams points out, this book has a 
predecessor: the task of reviewing the relevant 
research literature was undertaken in the 1960s 
by Jeanne Chall whose report was published 
nearly 25 years ago (Chall, 1967). Appropriately, 
Adams often refers to the earlier work. It, too, was 
a praiseworthy review, but time does not stand 
still. The unprecedented technological explosion in 
the work place presents ever greater demands on 
reading skills. Moreover, the crisis in the schools 
has intensified, consensus on a remedy for the 
unacceptably high rate of illiteracy in our society 
seems as elusive as ever. 

In the meantime, research activity has 
mushroomed both in quantity and in variety. An 
important new development since ChalVs book 
appeared is the rediscovery of reading as a central 
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problem for investigation by mainline psychology. 
No less significant, reading and orthography have 
become major concerns within the fast^growing 
fields of applied linguistics and the psychology of 
language. One consequence of the remarkable 
surge in research on reading is obvious: Anyone 
who would undertake to review the literature 
must be prepared to digest and critically evaluate 
an enormous range of material. Accordingly, 
heavy demands are placed on a reviewer's 
knowledge and critical judgment. On the whole, 
Adams proves more than equal to the task« 

The report has five parts. Part I deals with the 
nature of writing systems, the origin of the 
alphabet and the place of word recognition in 
reading. Part 2 presents the rationale for 
approaches to instruction that emphasize phonics, 
and it reviews research that attempts to compare 
the efficacy of this approach with other 
approaches. Part 3 presents conceptions of reading 
firoir the standpoint of laboratory analysis of what 
skilled readers do. It presents a model of the 
reading process that encompasses each of the 
components of reading skill and their integration 
in the act of reading. Part 4 articulates the goals 
of instruction in reading from the standpoint of 
the analysis of the skills of the mature reader 
presented in Part 3. Part 5 discusses research on 
the processes involved in learning to read. Part 6 
summarizes the conclusions reached from the 
review of the research literature and discusses the 
implications for teaching and learning to read. 

Adams begins with a discussion of the naturt) of 
writing. It is noted that true orthographies, unlike 
picture writing, represent words, and not mean- 
ings directly. This is an appropriate starting point 
because it underscores the key significance of the 
word in reading. The importance of apprehending 
each and every word in the text cannot be taken 
for granted, because it is unfortimately true that 
some popular programs of beginning reading in- 
struction encourage the novice to skip words or to 
guess in the search for meaning. Adams leaves us 
in no doubt where she stands: This is bad advice 
for a beginning reader or anyone else, Unless the 
processes involved in individual word recognition 
operate properly, nothing else in the system can 
either (p. S).** The ability to identify printed words 
is necessary but not sufficient for reading; it must 
be backed up by well-oiled mechanisms of lan- 
guage comprehension. Reading depends on a sys- 
tem of skills whose components must mesh 
properly. 

Alphabetic forms of writing are codes on the 
phonological structure of the language, or more 



properly, the morphophonological structure. By 
using letters to represent the several dozen 
consonant and vowel soimds of the language, 
alphabets achieve their great advantages over 
other forms of writing: First, economy — a small 
set of symbols is sufficient to represent any and all 
words in the language; second, transparency — a 
user who knows how the system works can 
usually recognize words in print that were 
previously known only through spoken language. 
Adams' account notes that these advantages come 
at a cost that must be borne by the beginner. 
Every alphabetic system presents its users with a 
problem of cognitive penetrability. Because vowels 
and consonants are co-produced and overlapped in 
time, these abstract phonemic units are not 
realized in speech as physically separable chunks 
of sound. That is probably one reason why they 
are often difficult to apprehend consciously 
(liberman, Shankweiler, Fischer, & Carter, 1974). 
For the purposes of speaking and listening, 
language users need not attain awareness of 
phonemes. But to grasp the principle (by which 
alphabetic writing represents the phonemes and 
morphophonemes of the language), a would-be 
reader must first identify the speech units that 
the letters represent. Consequently, the grasp of 
the alphabetic principle is a rather sophisticated 
intellectual achievement. 

Because the orthography of English is complex 
and often irregular, some commentators have 
overlooked that it is, nonetheless, essentially 
alphabetic. Adams does not make that mistake. 
Yet to dwell on the irregularities, as she does at 
the end of Chapter 2, is to invite a reader who is 
less than astute to draw the wrong conclusion and 
to miss the larger point: that there is a system to 
be learned and that, even in English, knowledge of 
the orthography is productive. 

The chapters tliat follow present a much needed 
and thoughtful analysis of the pertinent 
information on phonics and reading. As for 
phonics, the term itself has long been a source of 
confiision. For the most part, Adams uses the term 
simply to denote instruction aimed at instilling 
the alphabetic principle. Well and good. But 
unfortunately the term has other connotations 
that are hard to shake off: In the minds of some 
people, phonics denotes an old-fashioned and 
discredited method of teaching reading by having 
children attempt to recognize a word by speaking 
the ''sound* of each letter. The method implies 
that what a reader does is to approach words 
piecemeal by translating the letters that make up 
a word into their phonetic equivalents, letter by 
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letter, as though reading were simply spelling 
aloud. Thus the term phonics has come to 
represent an inapt caricature of the reading 
process. Accordingly, laberman and Ldberman 
(1990) recommend substituting for phorda Chall's 
term, code-based approach. 

As Isabelle Ldberman (who is cited by Adams on 
this point) often explained, letter-by-letter 
encoding is assui^ly not what a successful reader 
does. The word bftt contains one syllable, not 
three; the word is not buh-a-tuh but bat. Yet some 
beginning readers will say something like '^buh-a- 
tuh' when asked to read the word and will never 
manage to discover that the word is bat 
(Liberman, 1973). In Adams words, *T[t is as 
though these children can find no connection 
between the sequence of sounds they have 
produced and the hi^ly familiar word which they 
have Vead/ It is not enough to have memorized 
the soimds that go with each letter. To make use 
of those soimds, the child must realize that they 
are the subsounds of language* (p. 208). 
Beginners who are stuck in this way can be helped 
to develop phonological awareness, that is, to 
become aware of the phonological structure of 
words, by identifying their phoneme and syllable 
constituents. Then they are prepared to grasp the 
alphabetic principle and can begin to build word 
recognition skills on a solid foundation. As Adams 
notes, experienced readers parse the letter 
strings, ordinarily apprehending sequences of 
letters that correspond to a demi-syllable at 
minimum. According to laboratory research 
discussed in Part 3, such sequences constitute the 
major spelling patterns that experienced readers 
implicitly recognize as wholes. 

Spelling patterns must be not only apprehended 
but also overleamed to the point that word 
recognition can become unhesitating and 
automatic. Speed, as well as accuracy, is 
important because the fast-fading short-term 
memory forms the stage for the integration of 
words into syntactic units. If word decoding 
routines work poorly, all other aspects of reading 
will be hampered and comprehension will be 
correspondingly poor, a point often stressed by 
Perfetti and his associates (PerfeUi, 1985). Thus, 
althoiigh word recognition per se is not the goal of 
reading, getting the meaning of the text depends 
on it And word recognition, in turn, depends on 
accurate identification of the lower-level building 
blocks: the letters and the spelling patterns 
formed by letter combinations. 

In Part 3, Adams sketches a model of reading 
that derives largely from the work of Seidenberg 



and McClelland. The chief characteristic of this 
model is that information the reader derives from 
print interacts freely and at every level with 
stored kciowledge. Ihus the model contrasts with 
a hierarchical model in which information flow is 
largely unidirectional and bottom-up. Other 
researchers have maintained that an interactive 
model does not readily account for the important 
differences between reading and speech 
perception. Above all, it offers no explanation of 
the fundamental fact that speech is acquired by 
every neurologically normal child whereas reading 
skill is far from universally acquired. For some 
researchers, a unidirectional model seems dictated 
by the modular nature of the language apparatus 
(see Grain, 1989; Fodor, 1983; Shankweiler & 
Grain, 1986). Of course the question is not 
whether linguistic input (whether speech or print) 
must make contact with stored knowledge, but 
how and when. The modular view supposes that 
processing within the language module is 
accomplished before the linguistic input is 
integrated vrith other aspects of cognition. On this 
account, it is emphasized that word recognition by 
ear is privileged in the sense that it is served by 
mechanisms that evolved in our species and that 
form part of a coherent biological specialization for 
language. In contrast to speedi, the alphabet is an 
artifact. Learning to use it is a cognitive task in a 
way that primary language acquisition is not It 
has been argued that an adequate theory of 
reading would have to explain the difficulty of 
reading and the comparative ease of acquiring a 
spoken language (Liberman, 1989). 

After exanoining the myriad studies comparing 
programs for the teaching of beginning reading, 
Adams concludes that the great m^ority of 
program comparison studies indicate that 
approaches that incorporate code-based 
instruction '"...result in comprehension skills that 
are at least comparable to, and word recognition 
and spelling skills that are significantly better 
than, those that do not" (p. 49). This, she notes, is 
exactly the same conclusion that Jeanne Ghall 
drew 25 years earlier. Gode-based approaches that 
help the beginner to appreciate that words have 
an internal phonological structure and to 
recognize that word spellings represent that 
structure have the edge over programs that pass 
over these aspects. 

While stressing that these program comparisons 
are essential, and have been highly informative, 
Adams is sensitive to the limitations of these 
research studies and in Chapter 3 she 
knowledgeably discusses the reasons why they so 
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often yield noisy data. The classroom teacher, who 
is charged witii implementing the program, is 
often the weak link. Adams' conviction that 
successful readers must grasp the alphabetic 
principle and that code-based teaching is the bast 
way to b.alp beginners to grasp it stems only in 
part from, such program comparisons. At least as 
important are other research findings which are 
discussed in detail in this book. The pertinent 
evidence comes from a variety of sources: It 
includes the findings of researdi on prereaders, 
prediction studies seeking to identify those 
preschoolers who are at risk for reading faitur^. 
follow-up studies on the long-term educational 
consequences of failing to crack the code in the 
early primary grades, studies identifying the 
shared characteristics of xinsuccessful readers, 
and finally, the picture of reading derived from 
▼^esearch on the skilled reader. Adams concludes 
that all these lines of evidence converge in 
underscoring the vital importance of helping 
children grasp the alphabetic principle from the 
beginning. This entails giving prereaders 
adequate preparation for learning to read by 
instilling phonological awareness (introducing, 
through well-chosen word games, the fact that 
words have an internal phonological structure), 
and by demonstrating to beginning readers, 
through examples, how the spelling of a word 
represents its phonology. 

Of course, some children will infer the principle 
with little guidance from anyone and will make 
rapid progress in word recognition skills. But for a 
significant minority, which includes some children 
from highly favorable home backgrounds as well 
as many from unfavorable home environments, 
extensive instruction is needed to compensate 
what appears to be a general weakness in the 
phonological component of language. 
Unfortunately, these are the very children who 
are often deemed unable to profit from such 
instruction and are therefore denied access to it 

If the case for code-based instruction is 
unassailable, why^ then, is it so often resisted? 
Adams ponders this question near the end of the 
book. She is inclined to think that the reason is 
that it is often poorly implemented in practice. 



Implementation, she notes, depends on clarity 
with respect to goals; the teacher must 
understand why each activity is included, ^t is 
with respect to principles and goals that I would 
most strongly fault the major reading curricula* 
(p. 423). Certainly, one cannot disagree that it is 
vitally important for teachers to understand what 
they are attempting to accomplish throu^ their 
teaching, and that a recipe book or a manual, no 
matter how logically ordered and detailed, will not 
impart that knowledge. The problem will not be 
easy to solve. There is much ignorance concerning 
the needs of beginning readers both on the part of 
teachers and teachers of teachers. Adams' book 
takes many constructive steps toward remediation 
of ignorance about reading. Let it be read and 
reflected upon in every place where teachers of 
reading are taught, and may it shine like a 
beacon! 
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Null Subject vs. Null Object: Some Evidence from the 
Acquisition of Chinese and English* 
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Since young English-speaking children use null subjects systematically, it has been 
proposed that they begin with the initial parameter setting allowing null arguments 
(NAb), and must change this setting on the basis of linguistic evidence that adult English 
prohibits NAs. A recent proposal suggests that the licensing and identification of NAs used 
by English-speaking children is like that used in adult Chinese. This predicts that young 
Chinese- and English-speaking children should exhibit parallel perfonnance in their iise of 
NAs. This study investigated this prediction using an elicited production task with both 
Chinese- and English-speaking children. Although the hypothesis that early English 
allows null subjects was upheld, the evidence is against the claim that early English is a 
discourse-oriented language like Chinese: while the Chinese children systematically used 
null objects, the American children did not. An alternative analysis of the use of null 
arguments is suggested. 



1, INTKODUCnON 

1,1 The Null Subject Phenomenon in Early 
Child Language 

The null subject phenomenon, i.e., the frequent 
absence of lexical subjects, is one of the most 
noticeable characteristics of early child language. 
The following (non-imperative) English sentences 
(la) and (2a), spoken by children aged from 1;8 to 
2;5 (cited by Hyams, 1983), are examples of this 
phenomenon. 
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(1) a. Read bear book (1) b. Kathryn read this 

Ride truck Gia ride bike 

Wantlookaman I want take this off 

(2) a. Outside cokl (2) b. ('It's cold outside') 

No moroing ( 'It' s not morning') 

Yes, is toys in there ( 'Yes, there arc toys 

in there*) 

In the examples in (la) the subject, though not 
phonologically specified, has a definite reference 
which can be readily inferred from context. Since 
sentences with null subjects like those in (la) co- 
occur with sentences like those in (lb), which do 
have lexical subjects, it is not likely that the 
missing subjects in (la) can be attributed to a 
performance constraint on sentence length. A 
further characteristic of children's speech at this 
age is illustrated by the examples in (2a). In these 
examples the imexpressed subject is an expletive, 
as shown by the ^translations' of these sentences 
in (2b). However, according to Hyams, children at 
this age do not produce sentences such as (2b). 

Additional studies of children's early use of 
subjectless sentences are found with both 
languages which do allow null subjects and those 
which do not, such as Italian (Hyams, 1986), 
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German (Clahsen, 1989; Weissenbom, in press), 
French (Weissenbom, in press), and American 
Si^ Langua^re (lillo-Martin, 1986, 1991). In all of 
these studies, it has been found that at an early 
age children use subjectless sentences like the 
ones illustrated in English above. 

The search for an explanation of diildren's early 
use of subjectless sentences can be related to 
studies of adult languages which permit such 
sentences as grammatically acceptable, by 
comparison to those which do not. In the next 
section, we review some diaracteristics of the null 
subject phenomenon in adult languages (since we 
include null objects as well as null subjects, the 
term has been generalized to 'null arguments'), 
and one proposal for the grammatical mechanisms 
underlying this phenomenon. We will then turn to 
a proposal accounting for children's use of null 
subject sentences which appeals to this axialysis of 
adi^t language. 



1^ The Null Argument Phenomenon in 
Adult Languages 

I^e null argument phenomenon is a well-known 
characteristic of adult languages such as Spanish, 
Italian and Chinese. Examples from these 
languages are given in (3). The English 
counterparts to tihese sentences require overt 
subjects. 

In these so-called Vi'o-drop' languages, the ex- 
pletive elements equivalent to English it and 
there are also phonologically null, as illustrated 
in (4) (Italian, irom Hyams, 1983), and (5) 
(Chinese).^ 

In adult Chinese, the expletive element 
equivalent to English it can be phonologically null 
as in Spanish or Italian, as illustrated above (5a, 
b, c).^ Alternatively, a non-expletive subject can be 
foimd in any of these sentence types, illustrated in 
(6a,b,c). 



(3) a. Mangia come una bestia. 

'(He/she) eats like a beast.' 

b. Come como \ma bestia. 
'(He/she) eats like a beast.' 

c. [e] lai-le. 

come-ASPi 
'(He/she) came.' 



(Italian; Hyams, 1983) 



(Spanish; Hyams, 1986) 



(Chinese; Huang, 1982) 



(4) a. Sembra che Gianni sia matto. 

'(It) seems that John is crazy.' 

b. Pioveoggi. 
'(It) rains today.' 

(5) a. [e] Xiayu-le. 

(It) rain-ASP 
'(It) is rmning.' 

b. [e] Yao xiayii-le. 
(It) going to rain-ASP 
'(It) is going to rain.' 

c. [oi] Kanshangqu [oi] yao xiayu-le. 
(It) seem (it) going to rain-ASP 
'(It) seems that (it) is going to rain.' 
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(6) a. Tian xiayu-le. 

sky rain-ASP 

lit., T?he is raining/ 

b. Tian yao xiayu-le. 
sky going to rain-ASP 
lit., The sky is going to rain/ 

c. Tiani kanshangqu [ei] yao xiajru-le. 
sky seem going to rain-ASP 
lit., The sky seems to be going to rain/ 



How can one account for the occurrence of null 
ar^ments in these languages, compared to 
languages which prohibit null arguments, such as 
English? Jaeggli and Safir (1989) proposed the 
following Null Subject Parameter, stated in (7), as 
a principle of Universal Grammar (UG) to make 
this distinction. 

(7) The Null Subject Parameter 

Null subjects are permitted in all and only 
languages with morphologically uniform 
inflectional paradigms. 
(Jaeggli and Saiir, 1989, p. 29). 

According to Jaeggli and Safir, a morphological 
paradigm is uniform if all its forms are 
morphologically complex or none of them are. For 
example, the Italian inflectional paradigm 
consists entirely of morphologically complex 
forms, hence null sul:uects are allowed; in Chinese, 
no forms are morphologically complex, hence null 
subjects are allowed here too. In the case of 
English, however, morphologically complex forms 
such as walks, walked, walking, coexist with 
morphologically simple forms, such as walk. Thus 
English is a ^ixed' system and null subjects are 
prohibited. 

The Null Subject Parameter stated in (7) tells us 
when a null subject is possible. However, Jaeggli 
and Safir (following odiers such as Rizzi, 1986) 
also propose that a null subject can occur only 
when its referential value can be recovered. They 
propose three mechanisms for the identification of 
null arguments: (i) local AG(reement) including a 
tense feature, (ii) a c-commanding nomine i, or (iii) 
a Topic. Failure to satisfy either of Uie two 
necessary and sufficient conditions, namely, a 



morphologically uniform paradigm and a 
recoverable referential value for the thematic null 
subject, will result in the prohibition of null 
subjects in a language. Although the use of null 
arguments thus requires two conditions to be met, 
for ease of exposition we will refer to a Null 
Subject (or Argument) parameter with settings 
[<fApro-drop]. (This also enables us to be neutral 
with respect to other analyses of the null 
argument phenomenon.) 

The use of local AG to identify the reference of a 
null argument follows from numerous reports in 
the literature linking null arguments with ^ch' 
agreement. Early reports were confined to 
languages with only subject-verb agreement (such 
as Italian, discussed in Rizzi, 1982); these 
languages allow null arguments identified by 
agreement only in subject position. Later studies 
(such as McCloskey and Hale's 1984 work on 
Irish) have demonstrated that languages with 
other types of agreement often display null 
arguments in other i}ositions. Jaeggli and Safir 
add the condition that a tense feature must be 
present in order to account for the lack of null 
arguments in (Jerman and other V2 (verb-second) 
languages. The null arguments which are 
identified by AG are considered to be members of 
the empty category pro, [^pronominal, 
-anaphoric]. 

The use of a Topic to identify null subjects 
follows from Huang^s (1984; 1989) proposal. 
Huang distinguishes ''discourse-oriented* 
languages from ''sentence-oriented* languages. 
The ''discourse-oriented* languages, like Chinese, 
have a rule of "topic-chaining* by which the 
discourse topic is grammatically hnked to a null 
sentence topic which in turn identifies a null 
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argument This null argument is a variable left 
from the movement of the empty topic to sentence* 
topic position. Acc;ording to Huang, a topic may 
bind a variable in either tubd^t or object position. 
These two kinds of null arguments are illustrated 
in (8). 

In addition, there is a third method of identify- 
ing null argiunents which results in a sub- 
ject/ofcdect asymmetry. Because a c-commanding 
NP can also be an identifier, in languages like 
Chinese a null pronominal (pro) may be found in 
embedaed sutdect position^ as in (9a), but not in 
object position, as in (9b). Hiis distinction is found 



because the empty embedded subject can be iden* 
tified by the matrix subject; it functions grammatr 
ically like a pronominal rather than a variable. 
However, the empty object cannot be identified by 
the matrix subject, since identification has to be 
by the douMt nominal element.^ Thm, empty ob- 
jects can only be identified by an empty topic, in- 
dicated by OP in (10). 

To summarize, Jaeggli and Safir proposed that 
the difference between the grammar of pro drop 
languages such as Italian versus those such as 
Chinese is the method of identification of the null 
argument. This is illustrated in (11). 



(8) a. Discourse Topici b'topici U [eillNFL lai-le]] 

come-ASP 

*(He) came.' 
(Huang, 1984) 

b. Discourse Topici [g' topici Uwo INFL[ineik^an[ei]]]] 

I not see (bimi) 

*l did not see (him).' 

(9) a. Zhangsiini, tai shu o[eil mei kiuajildi (Huang, 1989) 

Zhangsan he say no see lisi 
^Zhangsani, hej said that (hei) didn't see lisi.' 

b. *Zhangsani , tai shuo I^i mei k^*ian [eil 
Zhangsan he say lisi no see 
'Zhangsani, hei said that Lisi didn't see (himi).' 

(10) [ OPi [ Zhangsanj shuo [ Insiic k^ian [ei ] le ]]] 

Zhangsan say lisi see ASP 
'Zhangsanj said that lisik saw himi/*j/*k-' 

(11) a. [g proi [iNFL AGj/Tense] ] 

(identification by AG, Italian) 

b. Discourse TopiCi [topiCi [s ti [INFL] ] 

(identification by a discourse topic, Chinese) 

c. Subjecti verb [g proi VP] 
(identification by a c-commanding NP, Chinese) 
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1«3 Null Subjects in Children's Gnunmars 

From the above, it may be seen that ^arly' 
English resembles a pro-drop language in three 
respects. Firsts lexical subjects are optional; 
second, the subject has definite reference even 
when phonologically null (except in the case of 
niill expletives); and third, lexical expletives are 
absent (Hyams, 1983; 1989). 

How can one account for the development an 
English-learning child has to undergo in order to 
arrive ultimately at a steady state grammar so as 
to speak the right type of English? A recent 
analysis by Hyams (in press; Jaeggli & Hyams, 
1987), following the analysis of null sul^ects in 
adult languages by Jaeggli and Safir (1989) 
discussed above, proposed that the early 
grammar, like adult grammars, is constrained by 
the Null Subject Parameter cited above. That is, 
the early grammar satisfies the requirement of 
morphological uniformity and the requirement 
that null arguments be properly identified. 

Hyams argues that English-speaking children 
begin speaking a Chinese-like language, i.e., a 
discourse*oriented language. Under the child's 
initial analysis, English is morphologically 
uniform with uniformly simple forms. Hyams 
takes children's verb productions, which at this 
time are generally not inflected, as evidence for 
this position. She further proposes that young 
English-speaking children use null topics to 
identify the reference of their null subjects. The 
child will then need to learn that English is not a 
^Discourse Oriented' language in order to properly 
exclude null subjects. 

In the case of Italian-speaking children, Hyams 
proposes that their early empty subjects are 
identified by AGKreement), as is the case in adult 
Italian. She proposes this early correct null 
subject use since Italian speaking children acquire 
the inflectional system fairly early. Thus, for these 
children resetting of the null subject parameter is 
not required. 

One potential problem for Hyams' analysis is 
that one would expect that a discourse-oriented 
child language should have both null subjects and 
null objects, since under topic identiHcation the 
null subject and null object phenomena are 
grammatically equivalent. However, according to 
the data she reviewed, Hyams claimed that 
English-speaking children do not use null objects. 
In order to account for this, Hyams thus proposed, 
following Roeper, Rooth, Mallis, and Akiyama 
(1984),^ that in the early grammar, the inventory 
of null elements includes pro, but not variables. 



This hypothesis would predict a null subject/null 
object asymmetry. Since null objects can only be 
variables, under this hypothesis null objects would 
not be allowed in the early grammar imtil some 
later point when variables mature. In order for 
this account to hold, Hyams must depart from 
Huan^s analyses of Chinese, and suggest that 
matrix empty subjects as well as embedded empty 
subjects can be pro, although only embedded 
empty 8ul:uects can be identified by a c-command- 
ing NP. Hyams says that matrix empty subject 
pros are identified by a discourse topic. 

According to Hyams' hypothesis, Chinese-speak- 
ing children, who will ultimately acquire a real 
discourse-oriented language, should first exhibit 
the same null subject/null object asymmetry as 
English-speaking children, and they should not 
produce null object structures until the point 
when they develop variables. Hyams' hypothesis 
would also predict one of two null subject-object 
asymmetries for English-speaking children. On 
the one hand, if they have not yet reset the Null 
Subject Parameter by the time that they acquire 
variables, then they will produce only null sub- 
jects early on, but will later include null objects as 
well once they have developed variables. On the 
other hand, if the English-speaking children have 
reset the null subject parameter before they de- 
velop variables, they will never use null objects. 
Thus, knowing when English- and Chinese-learn- 
ing children use null subjects and objects com- 
pared to when they develop variables is important 
for evaluating Hyams' proposal. 

The evidence regarding the timing of use of 
variables versus resetting the null subject param- 
eter is not wholly consistent with Hyams' ap- 
proach. Roeper (1986) gives evidence that children 
have some uses of variables by age three to four 
years. All of his evidence for the use of pros rather 
than variables with wh-questions occurs with 
older children (ages 8 to 10) and long-distance 
questions. However, his proposal that children use 
pros instead of variables even at this later age can 
also be questioned, given new evidence regarding 
children's very early comprehension and produc- 
tion of wh-questions and strong crossover con- 
structions (see Thornton, 1990). We therefore used 
the production and comprehension of wh-questions 
in the study reported here as evidence for the 
existence of variables in children's grammars. 

The timing of the use of null subjects is easier to 
determine. The acquisition data Hyams used to 
support her hypothesis indicate that the 
restructuring of the Null Subject Parameter takes 
place around 26 to 28 month If Hyams' proposal 
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that young children do not have variables is true, 
then we will not expect to see any null objects in 
the production of English-speaking children, since 
the restructuring takes place prior to the 
development of variables; and of course a clear 
decline in their use of null subjects should appear 
following the resetting of the NA parameter 
around 2-1/2 years. However, if there is evidence 
that children do have variables while they still use 
null subjects (indicating that the resetting of the 
NA parameter has not yet taken place), then they 
will be expected to use null ol^ects too, according 
to Hyams' account 

In order to more fully evaluate Jaeggli and 
Hyams' proposals, we collected data on the 
acquisition of English and Chinese. The following 
experiment was designed to answer some relevant 
questions about Hyams' hypothesis through first- 
hand acquisition data. The questions we 
addressed include the following: 

i. Is a null subject/null object asymmetry 
exhibited in child Chinese and child English? If so, 
is it equivalent for the two groups? 

ii. If child Chinese or child English does exhibit 
null objects, do we have evidence that variables 
coexist with null objects? The emergence of wh- 
questions will be taken as evidence of acquisition 
of variables. 

iii. Can the presence of lexical expletives be 
taken by American children as evidence that 
English is not [-f-pro^lTop]? The use of overt versus 
null expletives will be examined to address this 
question. 

iv. What does the developmental pattern look 
like, as far as the null subject and null object 
phenomena are concerned, in terms of the 
parameterized theory of UG? 

V. What is the influence of linguistic 
environment during development of early 
grammar between ages 2 - 4-1/2? 

2. Method 

2.1 Subjects 

2.1.1 Chinese and American children. Nine 
Chinese children, 4 female and 5 male, aged from 
2;0 to 4;6, participated in the experiment. All of 
them were learning some variety of Mandarin 
Chinese as their first language. Their parents 
were graduate students from either mainland 
China or Taiwan, studying in the United States. 
Nine English-speaking children, 5 female and 4 
male, aged from 2;5 to 4;5, were also tested using 
the same procedure. Their parents were members 
of the University community. All the subjects had 
normal hearing. There were no recorded 



developmental delays of any sort. Subject 
characteristics are given in Appendix 1. 

2.1Jt Chinese adult controls. Nine Chinese- 
speaking female adults participated in the 
experiment* They were all bom in mainland 
China or Taiwan, speaking some variety of 
Mandarin Chinese. They were the mothers of the 
Chinese diild sul^ects. 
2.2 Procedure 

2J2.1 Controlled production data collection. This 
part of the experiment was carried out in the 
experimoiter^s home for the Chinese children, and 
in the observation room at a day care center for 
the English-speaking children. There were two 
story books used. One was a stoiy book designed 
by the experimenter (QW) about the daily life of a 
little boy named Baldy (who had no hair). A doll 
house with dolls and furniture corresponding to 
the settings and diaracters in the book was used 
to familiarize the suboect with the main character. 
Another story used was a pop-up book, The Three 
Little Pigs.' The testing was carried out after the 
experimenter played with the child subject a 
number of times and established rapport. The 
subject's task was to tell the experimenter the 
story. For the first story, the experimenter and the 
subject played with the doll house and dolls. Next, 
the sul:i}ect was asked if he or she wanted to read 
a book about Baldy and then to tell a stoiy about 
him. The answer was invariably positive. The 
entire procedure was audio recorded. All 
interaction with the Chinese-speaking children 
was conducted in Mandarin; that with the 
English-speaking children was in English.^ 

2.2.2 Eliciting expletive structures. In this part 
of the experiment, a number of pictures were 
displayed to the child subject and then he or she 
was asked to tell what happened in the pictures. 
This part of the experiment was designed to elidt 
expletive structures for the English-speaking 
children and to compare their productions to those 
produced by the Chinese-speaking diildren under 
the same situation. 

2.2.3 Adult controls. The Chinese adult subjects 
were asked to tell the stories and talk about the 
pictures, while pretending that they were talking 
to their own child. The testing was conducted in 
the subjects' home without their child or the 
experimenter present The testing materials were 
identical to those prepared for the child subjects. 
The whole procedure was audio taped. 

2.3 Data reduction 

i. Ihe mean percentage of sentences with null 
subjects for eadb speaker was calculated based on 
the ratios of the sentences with null sufa(jects to 
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the total nuunber of sentences produced when 
telling the two stories. These ratios were averaged 
over the total number of subjects in each language 
groiq), over each age level (2-, 3-, and 4-year olds), 
and over each MLU level (3.5, 4.5, 5.25) 
separately. The standard error of the means (s.e.) 
was also calculated.^ 

ii. The mean percentage of sentences with null 
objects was calculated using a similar method. 
The ratio was the total nxxmber of sentences with 
an underlying structure of SVO to the total num- 
ber of sentences produced with a null object. For 
the Chinese data, in addition to this criterion, any 
two-morpheme compounds which have been iden- 
tified as a word by the authoritative dictionary — 
Xiand&i Hkny\I Ciditfn (Modem Chinese 
Dictionary) (Institute of Linguistics, Chinese 
Academy of Sciences, 1973) — ^were not included, 
even if they had the V+0 formation. For example, 
(12a) was identified as a single word, so it was ex- 
cluded; but (12b) was counted because it was not 
identified as a single word^The reason for this 
constraint is that it is geiMrally agreed among 
Chinese linguists that a verb+complement com- 
pound is not equal to the structure of V-s-O; unlike 
the latter, the former is already in its minimal 
construction and is not divisible; therefore, these 
two types of words are anal3rzed differently. 

(12) a. ya zao 

wash bath 

take a bath* 
(12) b. 3a shou 

wash hands 

*wash hands* 

g 

H so - 



3 

as 

I CO. 




Figure 1. Mctit percentage of sentences with null evblecls 
adults. 



iii. The MLU for child subjects in both 
languages was calculated^ using the productions 
made for the stories, according to the method in 
Brown (1973). 

iv. A second measure of the mean percentage 
of sentences with null subjects for English- 
speaking children was also calculated in the same 
way, excluding the sentences with null siibjects 
using a gerund or to-infinitive. The reason for this 
exclusion is that given the discourse, these kinds 
of sentences are also allowed in the adult 
grammar of English. This second measure is 
labelled ^a^usted' in the figures. 

V. The data gathered from testing the 
expletive structures was excluded from the 
calculation of the mean percentages. This part of 
the data was only evaluated for structural 
differences among the three testing populations. 
No quantitative analysis was involved. 

vi. The children's comprehension and 
spontaneous productions of wh*questions during 
the course of the study were evaluated, for the 
purpose of determining their use of variables. 

3. Results 

3.1 An Overall View of the Results (for 
details see Appendices 2 and 3) 

3.2.2 Null subjects. From Figure 1, it may be 
seen that there is a noticeable difference between 
the mean percentages of sentences with null 
subjects produced by Chinese child subjects and 
that by American child subjects at 2 - 4-1/2 years. 
Examples for such sentences are (ISa^b) for the 
Chinese child subjects, and (14a,b,c) for American 
child subjects. 

CtiiiMM ohiUran 
■ AifUfkfi chttdran (uw>dju1»d) 
Q AiiMHcan chikir«n (mIKmiImI) 

Q CMMMaduH 




pmliiced by Chinese and American childrtn and Chinese 
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Wang ei at. 



(13) a. Zh^ lx6ang vr&wA U^Uko. [e] ahuai. [e] shuai dao le. 

this yellow baby jump fall fall down ASP 

This yellow baby jumped. (He) fell. (He) fell down.* 
(ZY, 2;0) 

b. Fel w6n stashi ne. Is] zang. [e] s zaozaone. 

play sand NE dirty take bath NE 
'(He) is playing with sand. (He) is dirty. (He) is taking a bath.' 
(AN, 2;3) 

(14) a. [e] brush her hair, [e] brush hair. 

'(She's) brushing her hair. (She's) brushiag (her) hair.' 

[e] fighting like that, bang! 
'(They're) fighting like that, bang!' 

[e] playing. They all bent, [e] areplajdng. 

'(They are) playing. "Hiey (are) all bent. (They) are playing.' 

(AR,2;5) 

b. He got in there, [e] fell down. 
<He got in there. (He) fell down.' 
(DS,2;10) 

c. [e] jumping, [e] feU. They fell down, [e] sleeping. 
'(They're) jumping. (They) fell. They fell down. 
(They're) sleeping.' 

(SP,4;2) 



The meaB percentage of sentences with null 
subjects produced by Chinese children is 46.54% 
(s.e. = 3.78); while for the American children, it is 
33.11% (s.e. = 6.12). The Chinese adults produced 
sentences with null subjects 36*13% of the time. 
Given that Chinese is a pro-drop language, all the 
sentences with null subjects produced by the 
Chinese children are considered grammatical, 
with the reference of the null subject determined 
by the discourse topic Althou^ English is not a 
pro-drop language, some of the sentences with 
null subjects produced by American children, i.e., 
sentences with null subjects but using infinitives 
or gerunds rather than a full verb, can be judged 
as pragmatically acceptable in the given context in 
which they were produced. If we exclude these 
sentences from our count of sentences with null 
subjects produced by American children, the 



mean percentage drops to 14.58% (s.e. = 5.03). 
Comparing this adjusted mean percentage, 
14.58%, with the mean percentage of Chinese 
children, 46.54%, and that of Chinese adults, 
36.13% [one way ANOVA omnibus F(2, 24)=17.80, 
ps.OOOl], it is clear that Chinese children are 
dropping their subjects at a much higher rate 
than American children, and even a bit higher 
than the rate of the Chinese adults. The 
differences between the American children and 
the Chinese children, and between the American 
children and the Chinese adults, are both 
significant by Scheffi's tests lF(l»24)sdl.96, 
px.OOOl, and F(l, 24)=21.55, p=.0026 respec- 
tively!; the difference between the Chinese 
children and the Chinese adults it not significant. 
Even still, it is clear that American children do 
drop sul^ects a relevant amount of the time. 
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For both eroups of children, the null subject was the discourse, although it was usually 

sometimeB clearly related to an antecedent from understandable from the context; often, it was 

the discourse as shown in examples (15, Chinese) part of the pictures the children were describing, 

and (16, English). In other cases, the referent of Some examples of this type are given in (17, 

the null subject was not previously mentioned in Chinese) and (18, English). 

(15) a. ^ao zhuzhu zhu tangtang. 

little piggy boil soup 
little pig makes soup.' 
[e] zhu tangtang. 
(He) boil soup 
He makes soup.' 
(WW, 2;5) 

b. Dk ye langi zM zheli tou k^. 
Big wild wolfi ASP here secretly look 
The big wUd wolf is here peeping secretly.' 
[ei] zai kan xiao zhii. 
(Iti) ASP look Httle pig 
'It is looking at the little pig.' 
(HE, 3;1) 

(16) a. Look at this bad wolf. He got in there, [e] fell down. 

'Look at this bad wolf . He got in there. (He) fell down.' 
(DS, 2;10) 

b. The big bad wolf coming again and bang the door, [e] want to 
blow the house and the house is down. 
The big bad wolf (is) coming again and bang the door. (He) 
wants to blow the house and the house is down.' 
(SR, 2;8) 

(17) [e] k^ jingjing. [e] mei chuanxiexie 
(He) look mirror (He) not wear shoe 

*He is looking in a mirror. He didn't wear shoes.' 
[e] mei chuan w^wk. 
(He) not wear sock 
*He didn't wear socks.' 
(ZY, 2;0) 

(18) [e] jump i:::^. [e] jump in bed. [e] fall down. 

*(He) jumped up. (He) jtmiped in bed, (He) fell down.' 
(AR, 2;5) 
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Although both Chinese- and English-speaking 
children thus produced null subjects in a 
somewhat similar fashion, we believe this does not 
necessary show that they use the same 
mechanism in identifying and licensing the null 
subjects. We will return for further discussion of 
this point 

3.1.2 Null objects. From Figure 2, we may see 
that there is a considerable difference betweoa the 
mean percentages of sentences with null objects 
produced by Chinese child subjects, which is 
22.53% (8.e.=1.76), or by Chinese adults, 10.3% 
(s.e.:=1.58), and that by American child subjects, 
which is 3.75% (8.e.=1.31), [one way ANOVA 
omnibus JF'(2, 24)=37.21, p=.00011. Here, the 
differences between the American children and 



the Chinese children, the American children and 
the Chinese adults, and the Chinese children and 
the Chinese adults are all significant by Scheffe's 
tests [F(l^>=18,781, p=.0001, 24)=6.549, 
p=.0237, and F(l,24)=12.232, p=.0001, respec- 
tively]. With the Chinese children, only 27.59% of 
the total sentences with null objects are 
ungrammatical. The grammaticality of the 
Chinese object-drop sentences (i.e., whether the 
null ol^ect was used properly) was judged with 
respect to the context in which the sentence in 
question was actually produced. For the American 
duldren, 100% of the sentences with null objects 
were ungrammatical. Examples are given in (19) 
for Chinese child subjects, and (20) for American 
child stibjects. 
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Figure 2. M«an percentage of sentences with null objects produced by Chinese and American children and Chinese 
adults. 



(19) a. ♦Ou,ldngliii cHi [e], 

oh, wolf come eat (it=:pig) 

'Oh, the wolf came to eat (the pig)/ 

(ZY,2;0) 

b. *Tamen y^io qiu gki [e], 

they going to build (it^house) 
They are going to build (a house).* 
(WW,2;5) 

c. [e] kluikdn [e]. 



(ungrammatical) 



(ungrammatical) 



(grammatical) 
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(He=wolf) again look look (it=pig) 
'(He) liad another look at (the pig)/ 
(ZY,2;0) 

d. [ei] dnwan [ejl , (grammatical) 

(Heswolf) eat finish (it:5pig) 
'After (he) finished eating (the pig)/ 
laol^g dim ji5u bi^ da le. 
old wolf belly then become big ASP 
the old wolfs belly became big/ 
(LX,3;4) 

(20) a. *Look at [ej . [ej go a Kttle higher (migrammatical) 
Ijook at (him). (He) goes up a Httle higher/ 
(DS,2;10) 

b. *The other little pigs worry about [e]. (ungrammatical) 
The other Uttle pigs worry about (him)/ 
(ER,3;8) 



3.1.3 Null subject /null object asymmetry. 
Comparing Figure 1 with Figure 2, it may be seen 
that the null Bubject^null object asymmetry is not 
unique to the Chinese children. The ratio of the 
mean percentage of sentences with null objects to 
those with null subjects is 0.48, 0.23, and 0.24 for 
Chinese children, Chinese adults, and American 
children, respectively. If we recalculate the ratio 
for the Chinese children, excluding the 
ungrammatical sentences as in example (19a and 
b), (w* ich may be considered as errors), the ratio 
decreases from 0.48 to 0.29. If we do the same 
thing for the English children, considering their 
small percentage of object-dropping (3.57), which 
was ungrammatical, as errors, the ratio of course 
becomes zero. 

The amoimt of null object use by the Chinese 
adults is surprisingly low. However, it is impor- 
tant to note that we believe the ratio for Chinese 
adults would be higher than the vate we obtained 
if the data had been collected in an Adult-to-adult 
conversational situation, where most object drop- 
ping takes place, rather than in children's story- 
telling. Because of this discrepancy, we conducted 
a follow-up study with Chinese adults. 

In the follow-up study, five Chinese-speaking 
adults were interviewed by the experimenter in an 
adult-to-adult conversational setting. These adults 
were all women who had recently given birth to 



their first child. The interviews took place in the 
subjects' homes, and consisted of several parts. 
First, the subjects were asked to tell their child 
two stories as a wanning up. Then, they engaged 
in conversation with the experimenter. The 
conversations all included the same three topics of 
discussion: the woman's pregnancy and childbirth, 
her own lifestyle, and the growth and behavior of 
her child. The interviews were tape-recorded. 
Only the conversations were transcribed and 
scored according to the same procedures discussed 
previously for the initial study. The percentages of 
null subject and null object used by each speaker 
in this study are illustrated in Figure 3, and more 
detailed information is given in Appendix 4. 

As this Figure clearly shows, a subject-object 
as3rmmetry remains for the adult subjects, but the 
overall percentage of null object use increases 
greatly. Both of these facts are important for 
comparison with the children's utterances. In the 
follow-up study, the average object drop is 40.1% 
(s.e.==1.77), while the average subject drop is 
45.6% (s.e.=2.42). Although the amoimt of object 
drop is much higher than in the initial study 
(10.30%), the difference between the subject-drop 
and the object-drop is significant by a 2-tail paired 
i-test (^=4.073, p=0.015). Some examples of the 
adults' utterances with subject and/or object drop 
are given in (21) and (22). 
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HD HH LO 8L TJ 

Figure 3. Fcrccnkage of sentences with null aigumcnts prodticsd by Chinese adults in the follow-up study. 

(21) Taiji6u he dianmunaij ma. [e]] ye he [ej] hH duo. 
Hei only drink Httle milk MA. (Hei)yet drink (itj) not much, 
^e only drinks a Httle milk* (He) does not drink (it) much/ 

[eil zai he dian guozhi* [eil cbl dian shuiguo. 
(Hei) also drink a little bit juice. (Hei) eat a little bit firuit 
'(He) also drinks a little bit of juice. (He) eats a little bit of firuit.' 
(LQ) 



di^uisluj. 



(22) Tai yi6u di kdn 

Shei especially like watch TVj 
*She especially likes to watch TV.* 



Wf\t jidu pA tai fea yaifllng kdn huM-LE. 
Ik so afiraid shei BA eye watch bad-ASP. 
'I was so afiraid that she might damage her eyesight." 

[eiJ yi tian bu r£Lng tai kdn n^o jiou [ej]. 
dk) a day not let her watch that long (itj). 
'I do not let her watch (it) for long in a day." 
(TJ) 
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3J1 Results Broken Down by Age and by 
MLU 

In order to determine whether there is any 
relationihip between the null subject /null object 
phenomena and the child's linguistic maturation, 
the results were recalculated according to the 
child's chronological age and the child's MLU 
level. 

For the American children, the adjusted mean 
percentages of sentences with null subjects are 
25.89%, 4.48%, and 13.39% for age groiq) 2, 3, and 
4 respectively. For the Chinese children, the mean 
percentages of sentences with null subjects are 
55.73%, 45.65%, and 38.25% for these three age 
groups. Thus, in both languages, the proportion of 
subjectless sentences decreases over time. 
However, the American children seemed to make 
a surprising jump up in the use of null subjects by 
four-year-olds. 

To investigate this further, the percentage of 
null subject sentences was recalculated on the 
basis of MLU. It was found that for the Chinese 
child subjects, MLU levels were in accordance 
with their chronological age groups; however for 
the American child subjects, the 2- and 3-year-old 
groups had MLU levels corresponding to their 2- 



and 3-year-old Chinese counterparts, but the 4- 
year olds had an MLU level corresponding to the 
Chinese 3-year-olds. Thus, the American 3- and 4- 
year-olds were grouped together in one MLU 
group for the comparison of null subjects across 
MLU. 

Grouped by MLU, the American children 
produced subjectless sentences 25.89% of the time 
and 8.93% of the time for MLU level 3.51 (2-year 
olds) and 4.48 (3 and 4-year olds), respectively 
(see Figures 4 and 5). The difiference between the 
Chinese and American first MLU groups (2-year- 
olds) is not statistically signiAcant (1=2.209, 
ps.09), however, as can be seen in Appendix 1, 
this is essentially due to the youngest American 
subject (AR), who had a rate of subject 
drop comparable to that of his Chinese 
peers. The difference between the second MLU 
groups (Chinese 3-year-olds and American 3- 
and 4-year-olds) is significant by unpaired two-tail 
^-test (i=2.21, p =.0007). Clearly, the American 
children experience a sharp drop in their 
use of null subject sentences. The Chinese 
children, on the other hand, continue to use null 
subjects across the MLU groups (for the Chinese 
children, MLU groups are equivalent to age 
groups). 
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Figure i. Mean perecnUgc of •cntcncM with null nibjccts produced by ChineM and American children (by MLU, 
uiudfustcd) and ChincM adulu. 
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The pattern of use of missing objects is quite 
different (see Figures 5 and 6). Whether divided 
by age or by IfLU group, the American children 
used missing objects much less frequently than 
null subjects. The two-year-olds MLU 3.51) used 
missing objects only 8.3% of the time, while 
the older children used essentially none. In 
contrast again, the Chinese children used null 
objects much more frequently than the American 



children. They averaged 20.2% to 26,0% 
null objects, with the figures increasing slightly 
over the age/MLU ranges.^ Although the adults 
in the initial study produced far fewer null objects 
than the Chinese children, from the follow-up 
study we can see that the overall production 
of null objects by the children is approaching 
the level of use by adults in conversational 
settings. 
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Figure 5, Mean percentage of sentences with null subjects produced by Chinese and American children (by MLU^ 
adjusted) and Chinese adults. 
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Figure 6, Mean percentage of senteiKes with null objects produced by Chinese and American children (by age) and 
Chinese adults. 
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Figure 7. Mean peicentage of sentences wifli nuU objccto pn>duccd by Chinese and American children (by MLU) and 
Chinese adults. 



The Chinese- and English-speaking children do 
not differ significantly in tlieir use of null subjects 
at the earlier MLU stage tested: MLU level 3.5, 
but they do at the latter MLU stage: MLU level 
4.5. These results provide strong evidence for pro- 
drop in younger English-speaking kids (MLU level 
3.5). For the use of null objects, however, the two 
language groups differ significantly across all 
MLU levels. The differences in the use of null 
subjects and null objects by Chinese and American 
children indicate that the factors controlling the 
use of the two types of null arguments in the two 
groups are distinct. This is comter to the proposal 
by Jaeggli and Hyams (1987) which suggests that 
the two groups use null subjects for essentially the 
same reason. 

33 Results of Eliciting Expletive 
Structures 

In order to determine how the course of the 
development of expletive subjects interacts with 
the development of null versus overt subjects, 
children's productions of sentences calling for 
expletive subjects were examined. For the 



Chinese-speaking children, we examined whether 
they used a null subject as in (5) above, or a non- 
expletive lexical subject as in (6). For the English- 
speaking children, we examined whether they 
produced any lexical expletives, and further, 
whether there was any evidence that lexical and 
null expletives coexisted. 

In general, there was no evidence of the Chinese 
children producing structures with overt non- 
expletive subjects, such as those in (6a, b, and c) 
above, even among the 4-year olds. The only 
structures they used in the weather conditions 
were those with null subjects, as in (5a and b). 
They did not use the structure as in (5c) either. 
The only exception occurred when they talked 
about a windy condition. In this case they either 
used a structure with a null subject as in (23), or 
they used %ng,' CwindO> as an overt subject as in 
(24). The Chinese adults tised all ihe structures as 
in (5) and (6). They also used *feng,' the word for 
'wind,' in the same way as the Chinese children. 
The observed difference here between the Chinese 
children and the Chinese adults in their use of 
null subjects (as in 5a and b), and non-expletive 
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lexical 8ul:dects, (as in 6a and b), we believe, is d\ie 
to a stylistic raason rather than a grammatical 
one* In fact, sentences in (6a and b) are more 
colloquial than those in (6a and b)* However, it 
seems that the absence of the structure like that 
in (6c) from the data of the (Chinese children is due 
to a cramxnatical reason. While the null subjects 
in (5a and b) can be interpreted as referential, the 
one in (6c) can not. The structure (as in 6c) 



requires the ability to raise the subject from the 
embedded clause to the matrix clause. 

The American children had a different pattern. 
Except for the yoimgest one, (AR, 2;5), all the 
children showed some kind of evidence for the 
existence of expletive Ht' as in example (25). At the 
same time, however, they also used some null 
expletives as well, as shown in examples (25) and 
(26). 



(23) [e] yko bit zh&gegua di^, 

(itswind) want (BA) this blow down, 

[e] hdi yko ba zh^geje gui di&o. 

(it=wind) also want (BA) this too blow down. 

Wind) wants to blow this down, 
(it) also wants to blow this down too.' 
(ML,4;3) 



Fengdou tM d^-le, 
Wind also too big-ASP 



(24) Xi^mzM gua 6ng4e. 

now blow wind-ASP* 
fj^ngzi dou chui dao-le. 
house also blow down-ASP 

The wind began blowing now. The wind was so big 

that the house was blown down.* 

(SK,4;1) 



(25) It is raining. (SR, 2;8) 

It's very windy so the clothes are going up. (SR, 2;8) 
Kb rain. rain. They can't come out. (DS, 2; 10) 



(26) Snow. Raining (DS, 2;10) 

No snow. (SR, 2;8) 

Windy now. (EL,3;6) 

Raining. (AR, 2;5) 
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Hyams (1986) suggests that one piece of 
evidence that English-speaking children use to 
reset the null subject parameter to [-pro-drop] is 
the presence of overt expletives. Hyams argues 
that since it and there are not being used for 
pragmatic purposes (because they do not 
contribute to the meaning of the sentence), they 
must therefore be present for strictly grammatical 
reasons. Hence, lexical expletives could be used to 
trigger parameter resetting. Furthermore, as 
noted above, Hyams foimd that children use null 
expletives at the time they use null subjects. So 
the emergence of lexical expletives coincident with 
restructuring to [-pro-drop] is predicted. 

However, as our data show, some children do 
use both overt and null expletives at the time 
when they are using null subjects. Hence, it seems 
that the presence of overt expletives in the input 
is not a type of triggering data for resetting the 
null subject parameter. But why do the children 
use overt expletives when they sanction null 
subjects? Lillo-Martin (1987) has given a 
reasonable solution for this puzzle. She suggests 
that children have misanalyzed the expletives, 
and instead interpret ^t' as referential, even in 
sentences like, It's raining/ Because they have 
the wrong analysis of 'it,* they don't have the overt 
expletive evidence that English is not [+pro-drop]. 
So at this point, one cannot assume that the time 
at which a child starts using overt expletives will 



be coincident with the correct setting for the null 
8ul]0ect parameter. 

3.4 Results on the Use of Structures 
Exhibiting Variables 

In our data, both child language populations 
have shown some evidence for the existence of 
variables though the production of wh-movement 
(English), or the comprehension and production of 
wh-questions (Chinese). This can be seen in (27) 
and (28). These questions were produced and 
comprehended during the course of the experi- 
ment described above, at the same time as these 
children showed evidence of using null arguments. 

One might claim, following Roeper et al. (1984), 
that the empty categories used in these 
constructions are pros, not variables. However, 
work by Thornton (1990) and Sarma (1991) 
suggests that children at least at 3 years do use 
variables rather than pros in these constructions, 
since they correctly produce long distance 
questions and obey the strong crossover 
constraint. Therefore, we will assume that the 
empty categories used in the wh*questions shown 
above are variables rather than pros. In any case, 
it is the difference between Chinese- and English- 
speaking children with respect to null objects, 
without a corresponding difference with respect to 
evidence for variables in the form of wh-questions, 
that is relevant to our discussion. 



(27) a What's that? 

(AR, 2;5) 

b. Who's that? Baldy? Baldy is playing with mud, 
(SR, 2;8) 

c. That's what I think he did. 
(DR, 3;9) 

(28) a. Experimenter: Shui l^i-le ? 

Who came-ASP 
Who came?' 

Child subject: Lf^g, L^g Idi-le. 

wolf, wolf came-ASP 
*The wolf came.' 
(ZY,2;0) 



or; o 



b. Experimenter: D&hOi Idng 8h6nmo Idi-le? 

big grey wolf do what come-ASP 
'Why did the big grey wolf come?' 
Child subject: [e] m xiao zha Ah . 

(He) take little pig Ah! 
'(He) came to take the little pig away, of course/ 
(AN,2;3) 

c. shiskSmo? shishui ndngde? 
that is what? that is who did 
What is thatr mo did that?' 
(WW, 2;5) 



4. DISCUSSION: THE PARAMETERIZED 
THEORY OF UG AND LINGUISTIC 
EVIDENCE 

A review of Figures 4 through 7 indicates the 
following: 

i. At the earliest age tested, 2 years old or 
average MLU of 3.5, both Chinese and American 
children are using null subjects. The Chinese 
children are also using null objects. Although the 
American children do have a few sentences ¥ath 
null objects, the mean percentage of their 
sentences with null objects is only 3.57, so we will 
count these as errors; i.e., outside of the children's 
grammars. 

ii. For the Chinese children, as their MLU 
increases, the mean percentage of sentences with 
null subjects decreases, and the mean percentage 
of sentences with null objects increases. By the 
MLU level of 5.28, their subject-dropping rate is 
very close to that of Chinese adults, and their 
object-dropping rate is approaching that of the 
adults in the follow-up study. 

iii. For the American children, as their MLU 
increases, the mean percentage of sentences with 
null subjects (as well as sentences with null 
objects, which we are not counting as part of the 
children's grammar) decreases drastically, thus 
also coming in line with the corresponding adult 
grammar. 

iv. At each MLU level, both mean percentages 
are much higher for ihe Chinese children than 
their American counterparts, although for the 
first MLU group (MLU level 3.5) the difference 
between the Chinese- and English-speaking 



children in their use of null subjects is not 
statistically significant 

How can the observation that as early as 2 years 
old both Chinese and American children are using 
null arguments be explained? It might be 
understandable that Chinese children do so 
because adult Chinese is a pro-drop language. 
But then why would the American children also 
do so, given diat null arguments are not allowed 
in adult English? On the other hand, how can the 
observed differences between Chinese and 
American children in the null argument 
phenomena be explained along developmental 
lines? 

If we adopt the idea that part of the formulation 
of UG is a system of parameters, and the initial 
setting for a particular parameter is the same for 
all children constrained by certain principles, then 
the observed phenomena can be explained. As 
discussed above in detail, the principles of UG 
may tell us when a null subject can occur and how 
it can be identified. The data we obtained support 
the hypothesis that English- and Chinese- 
speaking children at a very early age have a 
grammar which allows null subjects. 

We are left, however, with three important 
questions for discussion. First, how strong is the 
asymmetry we found comparing subiject and otuect 
dropping in English compared to Chinese, and 
how can it be accounted for by parameter theory? 
Second, how does the child who begins with an 
incorrect parameter setting make the change to 
the adult grammar? Third, how does the linguistic 
environment make an impact on this parameter 
reacting? 
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4.1 On the Subject/Object Asymmetry 

Our data did not confirm Jaeeeli and Hyams' 
hypotheiii with respect to nuU objecU. Initead, 
our data indicate that while the Chkiese-speaking 
children used null objects from as early as 2 years 
old (the younfi^est age tested), the English- 
speaking children by and large did not use null 
objects. This returns us to the potential problem 
with Jae/^li and Hyains' accoimt discussed above. 
If English-speaking children have a Chinese-type 
langi^age as their initial parameter setting, then 
we would expect children learning both languages 
tc progress similarly in terms of the use of n\ill 
objects. However, this was not the case. 

We do not think that the null subject^nuU object 
asymmetry we found in Chinese- and English- ' 
speaking children's use of null objects can be 
accounted for by the non-existence of variables in 
early grammar. Both the Chinese- and the 
English-speaking children provided evidence for 
the emergence of variables. According to Hyams' 
hypothesis, the English-speaking children in this 
situation should use null objects at least as 
productively as the Chinese-speaking children do, 
but our data show that they do not. The sxnall 
percentage (3.57) is really within the error range. 
If the English-speaking children have reset their 
null argument parameter at this point, they 
should have stopped using both null subjects and 
objects. Our data show that this is not the case: 
they continued to use null subjects but essentially 
no null objects even though they had acquired 
variables. At the same time, the Chinese-speaking 
children (who showed the same kind of evidence of 
variables) did use null objects productively. 

As an alternative to Jaeggli and Hyams' hy- 
pothesis, we propose that there is more than a 
single parameter controlling the use of null argu- 
ments (following Lillo-Martin, 1986; 1991), One 
parameter, which can be called the Discourse 
Oriented Parameter (DOP) (following Huang, 
1984), permits languages with discourse oriented 
properties to have both null subjects and null ob- 
jects. These null arguments can be one of two 
types. Most are variables identified by a Discourse 
Topic. In embedded subject position there is also 
the option of pro, identified by a c-commanding 
NP. These null arguments correspond straight- 
forwardly to two of the identification options pro- 
posed by Jaeggh and Safir, given in (lib and c) 
above. For leamability reasons, assuming 
parameter setting takes place on the basis of 
positive evidence, we might expect that the initial 
setting of the DOP is [-DO]. If so, the performance 



of the Chinese-speaking children in our study 
indicates that resetting of the DOP to [^-Discourse 
Oriented] can take place early. Since other 
characteristics of discourse oriented languages, 
such as topic-comment structures and discourse- 
bound anaphors, can serve as evidence for 
determining this parameter setting, it is 
reasonable to assume that the Chinese-speaking 
children have made this setting and produce null 
subjects and null objects in accord with this 
grammatical option* 

The second part of our proposal is that null 
arguments in adult languages like Italian are due 
to a separate parameter, which we will call the 
Null Argument Parameter. This parameter 
permits null arguments when licensed by certain 
Case-assigning maximal categories, following 
Rizzi (1986). These null arguments are empty 
categories of the type pro, identified by the person, 
number-, and / or gender-features of the licensing 
category. Although subject-verb agreement is 
insufficient to license or ident^iy null subjects in 
adult English, we take it that English-speaking 
children who use null subjects are doing so 
because of this parameter, rather than the DOP. 
The subject-object asymmetry is related to the 
cross-linguistic observation that object agreement 
is much less common than subject agreement; 
hence pro null objects are found in many fewer 
languages than pro null subjects. Children will 
universally posit an INFL category with the 
potential of being a licenser for empty subjects, 
but not for empty objects. Hence, universally 
children will begin with a null subject h3npothesis. 
Changing the parameter setting to disallow null 
subjects will thus only take place after 
morphological agreement has been analyzed. 

Other proposals have been made arguing that 
the null subject phenomenon in early English is 
due to performance factors rather than a 
grammatical parameter setting (e.g., Bloom, 1990; 
Gerken, 1990; Mazuka, Lust, Wakayama, and 
Snyder, 1986). Although these suggestions are 
worth considering, there is considerable cross- 
linguistic evidence to take the early null subject 
phenomenon as representing a grammatical stage. 
Performance accounts of the early null subject 
phenomenon do not make the same cross- 
linguistic predictions as grammatical accounts do. 
More cross-linguistic work can contribute to the 
resolution of this debate; but the data currently 
available support the grammatical account. For 
reviews of performance versus grammatical 
accounts, see Hyams and Wexler (1991) and Lillo- 
Martin (1991). 
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4L2 Parameter Resetting 

The evidence is quite strong that both Chinete- 
and Englith-ipealdng children have a grammar 
which allows null subjects at an early age» since 
they were both using null subjects even at the age 
of 2 (examples 12a, and b and 13a, and b). For the 
Chinese diildren, since the adult language allows 
null arguments^ no diange will have to be made in 
their parameter setting. However^ for the English- 
speaking children, a parameter will have to be re- 
set on the basis of evidence for [-pro drop] from 
the linguistic environment. Our data shows that 
roughly between the age of 2 and 3 or MLU 3.5 to 
MLU 4.5, a drastic diange has taken place in the 
English-speaking child's grammatical develop- 
ment. That is, during this transition the English- 
speaking children show a dramatic decline in the 
production of null subjects. It seems to be at this 
point that the parameter resetting has taken 
place. 

How does this resetting occur? It is possible that 
the presence of overt expletives can be used as 
evidence that English is [-pro-drop], as discussed 
above. However, there is now some cross-linguistic 
data which indicates that the perfect correlation 
between overt expletives and [-pro-drop] which is 
needed for this kkid of evidence does not exist (cf. 
Jaeggli & Hyams, 1987, Hyams, in press). Even if 
this positive evidence is unavailable, however, it is 
possible that indirect negative evidence can be 
used (Lasnik, 1989). For the English children, 
since the child's initial setting is also [+pro-dTop], 
he would, like the Chinese children, expect to hear 
sentences with null subjects. When the child fails 
to hear sentences with null subjects in English, 
this will then be taken as indirect negative 
evidence that such sentences are not allowed in 
his language, hence, ungrammatical. The 
incorrect positive parameter will then be replaced 
by the correct negative setting [-pro-drop]. 

Note that our data do agree with some empirical 
data existing in the literature, which together 
may be taken as evidence for certain a priori, 
language-independent properties of early 
grammar hard-wired by parameters of UG. For 
instance, with our Chinese child subjects at MLU 
level 3.5, 20% of the transitive verb constructions 
were produced with null objects, which is very 
close to the 17% of the similar constructions 
obtained in Japanese children (Mazuka et al., 
1986). Also, for the American child subjects, the 
mean percentage of sentences with null subjects 
(15%) is very close to the percentage found in 
Gerken's imitation study (19%, subjects' mean age 
was 2;3; Gerken, 1990X Furtlier, the dramatic 



decrease in the mean percentage of sentences with 
null subjects observed in our American children 
between age 2 and 3 is consistent with Hyams' 
proposal of an inverse relationship between null 
subjects and the use of inflectional morphology. 
These studies all point to an initial [+pro-drop] 
setting, with resetting to [-pro-drop] for English- 
speaking diildren during the third yesir. 

43 Effects of Linguistic Environment 

What role does the linguistic environment play 
in this parameter-setting account of language 
development? Clearly, only data from the 
linguistic environment can trigger the resetting of 
a parameter, such as is needed for English- 
speaking children. However, the interaction 
between the child's initial setting of this null- 
subject parameter and the input of the child's 
linguistic environment seems to make itself felt 
even earlier and in more subtle ways than 
parameter resetting. Even the 2-year-olds we 
tested displayed a noticeable difference in the null 
subject/null object phenomena between the two 
testing populations. First of all, only the Chinese- 
speaking diildren used null objects to any extent. 
TbiUf as we suggested, can be due to a different 
parameter from the one used for null subjects in 
English-speaking children; one that could possibly 
be set on the basis of entirely independent data. 

A more extensive consideration of the role of the 
linguistic environment is called for if we take into 
account the proportions of null arguments used 
across the different age ranges in Chinese and 
English. Although the English-speaking children 
used null subjects frequently, they still used them 
less frequently than tiie Chinese children. In the 
case of null objects, we have suggested that the 
difference between English- and Chinese-speaking 
children is a difference related to their grammars: 
the Chinese-speaking children's grammars allow 
null objects, while the English-speaking diildren's 
grammars do not However, we do not make the 
claim that the difference in Uie use of null subjects 
is a grammatical difference. This seems to be a 
prime example of an area where the force of the 
linguistic environment is felt. Furthermore, as 
they develop, the use of null arguments by the 
Chinese-speaking children approaches that of the 
adult subjects. For example, the Chinese adults 
produced sentences in whidi the null argument is 
interpreted by virtue of a discourse topic estab- 
lished several sentences earlier, as in example 
(22) above. The youngest children did not exhibit 
this kind of long distance topic chaining. The fac- 
tors that control the pragmatically acceptable use 
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of null arguments (as opposed to their general 
grammaticality) will need to be learned by 
Chinese-speaking children, independent from the 
setting of grammatical parameters. This will be 
directly related to the linguistic environment.^ 

5* CONCLUSION 

In general, this study has shown some support 
for the hypothesis that English-speaking children 
begin speaking a [+pro-drop] language. The 
specific hypothesis of Jaeggli and Hyams (1987), 
that early English is a Chinese-type language, 
received mixed support. Support in favor of 
Jaeggli and Hyams* proposal may be seen through 
the following points: 

i. As early as 2 years old, which was the 
earliest age tested, the English-speaking children 
produced sentences with null subjects at 34.57%. 

ii. The Enghsh-speaking children did display 
an asymmetry in the use of null subjects, com- 
pared to their very low incidence of null objects. 

However, this data also throws Jaeggli and 
Hyams' (1987) theory into a dilemma. They use 
Roeper's (1986) proposal for the later development 
of variables in order to account for the proposed 
null subject/null object asymmetry. Our result 
shows that apart from the low level of null object 
errors, the English-speaking children never used 
any true null objects, consistent with Jaeggli and 
Hyams* analysis. However, we found this even af- 
ter the children had developed variables (as indi- 
cated by production of Wh-questions). According to 
Hyams, the English-speaking children should 
have displayed null objects when they developed 
variables, or else they should have gone through 
the business of null argument parameter restruc- 
turing by this time, and displayed no null sub- 
jects. But our data shows that they did use null 
subjects at this age. Furthermore, the English- 
speaking children were different from the 
Chinese-speaking children, in that the latter used 
both null subjects and null objects during the time 
we tested them. These observations provide coun- 
terevidence to the Jaeggli and Hyams proposal. 

This study also shows that although it is 
important to have theory guide research in the 
field of language acquisition, it is likely that the 
data will show where the predictions of the theory 
are in error, or where the theory needs 
refinement. Even if the parameter theory 
generally holds, it still could be true that the 
process of resetting might be slower for some 
parameters than others; in other words, in some 
aspects of the use of null subjects, the 
restructuring can be gradual and take a longer 



time than was previously thought. The result of 
this study also suggests that the linguistic 
environment or linguistic input shapes the child's 
grammar from a very early stage, e.g., as seen in 
the early cross-language differences in use of both 
null subjects and null objects. 
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FOOTNOTES 

•UnguageAofuisitkm, 2(3), 221-254 (1992). 
"^Alto University of Connecticut 
'^Alto Wesleyan University 
^ Abo W^esley CoUege 
^The following abbreviations are tised in the glosses: 
[e]: null argument 
ASP: Aspect 

DE (footnote 7); NE (p.l4); MA (p.20): Chinese particles which 
have no stress, and no meaning of their own when used in a 
statement 

6A {p23): a passivizing miorpheme in Chiiwse 
^Chinese examples not otherwise credited are provided by QW. 
^The null subjects in (5a, b ic c) can be interpreted or tmderstood 
as ^'sky.* 

^In his (1969) paper, Huang amends this option in a way which 
also allows the matrix subject to be pro, by saying that an 



emprty pronomirul (pro) must be identified by the closest 
rxmunal element ^ there is one. We will continue to adopt the 
(1984) analysis, by yMdi only embedded std^^ects c» be pro. 

^Roepcr, Rooth^ MaQiSr and Akiyama make ttUs suggestion for a 
coo^ldely dificrent reason. They disccss an es^crimcnt in 
whtdi dtildrcn appear to violate strong crossover for a long 
period of time. They account for this finding with the 
hypothesis that chUdran begin with pro but iwt variables as 
empty i^teg»xies. However, there is new evidence which 
suggests tnat children do not actually violate strong crossover 
(see McDanid Ic McKae, in pfws, Ihomton, 1990), and that 
tttey do have vaciabks. 

^The e xp er im en t er, QW, is a native speaker of Mandarin firom 
file People's RepuUic of China. She is also fluent in Er^lkh. 

'^None of the Chinese diildren in MLU group 35 (2-yearK)lds) 
and 4.5 (^-year-oUs) produced any s en t en c e s with embedded 
clauses. Only one of the 4-year-<rfds (YD) produced few 
sen t cice s with embedded clauses. However, all three of his 
sentCTKes with embedded clauses were produced with an overt 
subject, e.g., 

laxijb^ liToIsyi^dmibil dao zhi mOtou fkigzi de. 
HettKMi^t old wolfbk>w not down this wood house DE 
'He ttiou^ that the oid wolf could not blow down die wood 
house.' 

'statistical comparison between the use of mill objects by the 
American children and the Chinese children was uimecessary 
given the big differences between the rang^ of the 
peroentiges. 

^An interesting comparison can be made with the acquisition of 
German. Weiasenbom (in press) claims that adult German is 
Hke Chinese in allowing nuH arguments kientified by di«rourse 
topics, but he says ttiat the occurrence of null argtuncnts in 
German is more restricted than in Chinese, according to 
pragmatic fKtors. As he points out German-^ieakingduldren 
will then rMed to learn those pregmatic factors which allow for 
mill arguments in German on the l>asis of more linguistic 
experience than that which allows the Discourse Oriented 
.Parameter to be set He indicates that the devdopment of the 
correct t»e of nuU arguments in German takes some time. 

^^CaChtnese ChUdren; AC^American Children; 
AAC=A<^ted American Children; CA^sChinese Adults. 
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APPENDIX 1: CHILD SUBJECTS 



Subject 




age 


Sex 


MLU 


Subj.drop 


01::(i.drop 


Aclj.Sxibti.di 


ZY 


Chinese 


2;0 


F 


2.41 


48.103 


15.952 




AN 


Chinese 


2;3 


M 


3.60 


62.144 


21.335 




WW 


Chinese 


2;5 


F 


4.23 


56.937 


23.077 




HE 


Chinese 


3;1 


F 


4.44 


58.669 


24.159 




LX 


Chinese 


3;4 


M 


4.27 


44.532 


12.827 




ZZ 


Chinese 


3;5 


F 


4.52 


33.750 


27.143 




SK 


Chinese 


4;1 


M 


5.04 


45.439 


22.479 




ML 


Chinese 


4;3 


M 


4.83 


40.756 


29.365 




YD 


Chinese 


4;4 


M 


5.98 


28.572 


26.250 




AR 


Enghsh 


2;5 


M 


2.69 


58.636 


8.333 


51.177 


SR 


Enghsh 


2;8 


F 


4.10 


27.922 


9.091 


17.388 


DS 


Enghsh 


2;10 


F 


3.74 


17.156 


7.500 


9.091 


EL 


Enghsh 


3;6 


F 


4.58 


11.395 


3.125 


3.949 


ER 


English 


3;8 


M 


4.80 


25.981 


5.179 


5.390 


DR 


Enghsh 


3;9 


F 


4.65 


14.063 


0.000 


4.087 


SP 


Enghsh 


4^J 


F 


4.49 


59.524 


0.000 


18.831 


SM 


Enghsh 


4;4 


M 


3.84 


45.834 


0.000 


4.167 


PT 


Enghsh 


4;5 


M 


4.51 


37.436 


0.000 


17.179 
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APPENDIX 2: RESULTS FROM ADULT SUBJECTS 



Sxibject Subj.-drop Obj.-drop 



BM 


33.670 


6.719 


BX 


39.136 


22.028 


ET 


43.363 


10.417 


LM 


32.834 


10.976 


LP 


25.322 


8.495 


QG 


26.423 


7.143 


QQ 


40.94 


11.334 


WC 


40.298 


8.929 


YL 


43.177 


6.667 



ErJc 2C0 
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APPENDIX 3: RESULTS FROM CHILD SUBJECTS 



Mean DerceTit-flirfts of sentencea with null mihifirts and with null obiects 



Subd.ii 


Subj.-drcp 


(8.e.) 


Obj.-drop 


(8.e.) 




CC 


46.543 




3.776 


22.533 


1.761 




AC 


33.105 




6.120 


3.572 


1.313 




AAC 


14.584 




5.025 








CA 


36.129 




2.296 


8.387 


2.123 




Testiner res 


iilhs nrranprftd arcnrdincr t.n cbronolofncal ape 






Subj. Age 


MLU 


Sul:{j-drop 


(s.e.) 


A4j.SD 


(s.e.) Obj.-drop 


(s.e.) 


CC 2 


3.41 


55.728 


4.098 




20.192 


2.165 


CC 3 


4.41 


45.650 


7.215 




21.376 


4.361 


CC 4 


5.28 


38.252 


5.026 




OH AQ1 


1 QQ1 


AC 2 


3.51 


34.571 


12.427 


25.885 


10 Q^l Q QAQ 


U.40a 


AC 3 


4.65 


17.146 


4.484 


4.475 




l.OOo 


AC 4 


4.28 


47.597 


6.437 


13.392 


4.00 / U 


U 


Te<?tin£r results arranf ed accordinp to MLU 








Subj. Age 


MLU 


Subj-drop 


(s.e.) 


AOj.SD 


(s.e.) Obj.-drop 


(s.e.) 


CC 2 


3.41 


55.728 


4.098 




20.192 


2.165 


CC 3 


4.41 


45.650 


7.521 




21.376 


4.361 


CC 4 


5.28 


38.252 


5.026 




26.031 


1.991 


AC 2 


3.51 


34.571 


12.427 


25.885 


12.871 8.308 


0.459 


AC 3,4 


4.48 


32.372 


7.660 


8.933 


2.884 1.474 


0.991 
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APPENDIX 4: THE FOLLOW-UP STUDY 

Subject Total* of * of sentences % % 

sentences with transitive verbs Subj.-drop Obj.-drop 

HD 295 176 41.36 38.07 

HH 264 132 47.73 43.94 

LQ 288 97 38.54 35.05 

SL 316 122 49.68 39.34 

TJ 344 167 50.87 44.31 
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Haskin* LaboratoruM SiatusRepoH on Speech ReMoreh 
1992, SR-109/nO, 2S1-2S4 



Amplitude as a Cue to Word-initial Consonant Length: 

Pattani Malay* 



Arthur S. Abramsoni' 



Word-initial Pattani Malay consonant* are short or long. 'Hie cloauras of the long* conso- 
nanta are longer than those of the "short* ones; this is a sufficient cue for perception, but in 
voiceless plosives the duration of the silent closure is audible only after a vowel, yet listen- 
ers label such isolated words well and so must use other cues. The peak amplitudes for the 
first syllables of disyllabic words are greater for initial long plosives. In this study, incre- 
ments of closure duration and amplitude were pitted against each other for original short 
plosives and decrement* for original long plosives. In testa, duration was by far the more 
powerful cue, although amplitude did affect the category boundary. By itself, however, 
amplitude is a weak cue. Further work is planned <m the possible role of the shaping of the 
amplitude contour. 



1. INTRODUCTION 

Many languages are described as having a 
phonological distinction of length in vowels or 
consonants, or even both. If the term is taken lit- 
erally, we would expect to find that the underlying 
mechanism is control of the relative timing of the 
articulators. Even so, a single mechanism might 
have a number of acoustic consequences, each of 
which could help in perception. 

Pattani Malay, spoken by about a million ethnic 
Malays in southern Thailand, is unusual not only 
in having a length distinction for consonants in 
word-initial position but also in having one that is 
relevant for all phonetic classes of consonants in 
that position (Chaiyanara, 1983). Here are some 
minimal pairs of words showing the contrast: 

/labo/ *to profit' /kabo/ 'spider' 

/make/ to eat' /ncake/ 'eaten' 

/bule/ 'moon* Aiule/ "months' 

/kaio?/ 'to strike' /kiato?/ frog* 



The work was iupportcd by NICHD Grant HD-01994 to 
Haxkins Laboratories. The fieldwork in Thailand was made 
possible by a sebbetical leave from The UniTersity of 
Connecticut in 1988. 1 em grateful to the National ReMarcb 
Council of Thailand, the Department of Islamic Studies of The 
Prince of Songkhla Unnrernty, PatUni* and the Department of 
Linguistici of Chulalongkorn University, Bangkok for their 
warm hoapitaHty and help. 



If, indeed, the crucial aspect of the articulatory 
gesture is the duration of the closure or 
constriction, for pairs like the first two it would 
not surprise us to find that the length distinction 
is quite discernible whether in utterance-initial or 
intervocalic position. But what about the stop 
consonants, especially the voiceless unaspirated 
stops of the language? The voiced stops do have 
voicing lead, so if you are close enough, you can 
hear short or longer stretches of glottal pulsing 
during the occlusion. The occlusions of the 
voiceless stops, however, are silent. 

In earlier work (Abramson, 1987), I presented 
acoustic measurements of closure durations for 
the language, showing that the putative length 
categories are well separated by duration. Of 
course, the voiceless stops could not be measured 
in utterance-initial position. In another 
study (Abramson, 1986) I^emonstrated, by 
systematically increasing the durations of short 
closures and decreasing the durations of long clo- 
sures, that this feature is a sufficient and 
powerful acoustic cue for the perception of the 
distinction. 

As for the voiceless stops, it was conceivable 
that the two categories were auditorily distin- 
guishable in medial position only. This turned out 
not to be so in my control tests with unaltered 
words. Doing only slightly worse than vnth the 
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other classes of consonants, native speakers 
rather accurately identified short and long voice- 
less stops in isolated words. Among the various 
plausible acoustic effects of the mechanism, the 
most likely for the largely disyllabic words 
involved, was the peak amplitude of the first sylla- 
ble relative to the second. Indeed, measurements 
(Abramson, 1987) revealed that this ratio is 
greater for long plosives, that is, both stops and 
affricates. Presumably, greater air pressure 
accumxilated behind the occlusion before release 
accoimts for the differences. Althou^ both voiced 
and voiceless plosives showed a significant differ- 
ence, the level of significance was hii^er for the 
latter. No doubt, this is to be explained by 
differences in glottal impedance of the airflow. 
The difference is not significant for the con- 
tinuants. 

2. PROCEDURE 
This paper is a progress report of my test of the 
hypothesis that the peak amplitude of the first 
syllable relative to the second in disyllabic words 
is a sufficient cue for the perception of the 
distinction between short and long voiceless stops 
in Pattani Malay. For my nuyor experiments, as 
part of an interest in combinations of phonetic 
features underlying the same phonemic 
distinction, I have pitted variants in duration and 
amplitude against each other to determine their 
relative power. 

2«1« Control tests 

Althoxigh the identifiability of initial short and 
long consonants had been demonstrated 
(Abramson, 1986), it seemed desirable also to do 
control tests for the recordings of my new speaker 
for this study. For each of seven minimal pairs of 
words I prepared a test containing 20 tokens of 
each of the two words, yielding 40 randomized 
stimuli. There were two such randomizations for 
each word pair. The nasal, lateral, fricative, and 
plosive categories were represented. The plosives 
included voiced and voiceless stops and voiceless 
affricates. (Unfortunately, my only pair of voiced 
affricates included a word, as I learned later, that 
would have embarrassed the women among the 
subjects, so 'l could not use that test.) The subjects 
were 30 undergraduate students, all native 
speakers of Pattani Malay, at the Prince of 
Songkhla University, Pattani, Tliailand. 

2J2. Amplitude vs. duration 

To test for the relative power of amplitude and 
duration, three pairs of words with velar, dental. 



and labial short and long stops respectively were 
used. All of them were recorded at the end of the 
carrier sentence /die kaxo/ Ixe said.' By means of 
the Haskins Laboratories Waveform Editing and 
Display System (WENDY), the stop closure of the 
short member of each pair was lengthened in 20- 
ms steps until it reached or exceeded the duration 
of its long counterpart. The closure of the long 
member was shortened in the same way. The first 
syllable of eadi variant of the original short stop 
was increased in amplitude in five 2-d6 steps. 
Likewise, the first syllable of each variant of the 
original long stop was decreased in amplitude in 
five 2-dB steps. Two test orders were recorded 
from randomizations of two tokens each of all the 
resxilting stimuli and played to 30 native speakers 
for identification of the key words. 

2J5. Amplitude in isolated words 

The perceptual efiicacy of amplitude without 
help from closure duration was tested by taking 
all the amplitude variants from the original short 
<md long forms of one of the word pairs in section 
2.2. Two test orders were recorded firom ran- 
domizations of four tokens of each stimulus and 
played to 30 native speakers. 

3. RESULTS 

3«1« Control tests 

The previously demonstrated identifiability of 
the utterance-initial consonants (Abramson, 1986) 
was reaffirmed. The migor difference is that the 
voiceless long affricates in this sample were 
labeled correctly 96% of the time, whereas in the 
last study it was just above chance at 55%. 

3«2« Amplitude vs« duration 

Because of the limitation on space, the results of 
only two of the experiments are given here. Figure 
1 gives the responses of 30 native speakers to nine 
durations in 20-msec steps of the [k]-closure in 
Iksana^ 'goat' combined with six amplitude levels 
in 2-dB steps. The vertical axis shows the 
percentage identification as short /k/. The earlier 
crossover of the higher-amplitude curves at the 
50% point to the long-/k:/ category, giving 
judgments of /loamei}/ 'goatlike,' is highly 
significant [F (40, 1160)=:9.0, p<.0011; 
nevertheless, the values of duration at the short 
end are very little affected. The opposite proce- 
dure, shortening original long /la/ and lowering the 
amplitude, yielded similar results, as shown in 
Figure 2. The resxilts are essentially the same for 
the other two places of articulation. 
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Msec 

Figure 1. Responses to Acamcf/ 'goa^ and iU variants 
with increased closure duration and first-syllable 
amplitude. 

kramoi > kameq 




Figure 2. Responses to /ktame^/ 'goatlike' and its variants 
with decreased closure duration and first-syllable 
amplitude. 

33. Amplitude in isolated words 

In Fi^re 3 both the short and long responses 
are plotted for increments of amplitude on original 
/pagi/ 'morning.* While the two curves converge, 
tiiey never cross each other. Figure 4 shows rather 
similar effects for decrements of amplitude 
combined with isolated tokens of /pagi/ *early 
morning/ 



/pagI/ 




Figure 3. Responses to isolated /pagi/ 'morning' and its 
variants with increased first-syllable amplitude. 



/pagU 




db 

Figure 4, Responses to isolated /puigi/ 'early morning' 
and its vaiianU with decreased first-syllable amplitude. 



4. CONCXUSION 

It is clear that when both features are present, 
duration is dominant; nevertheless, the boundary 
between the two perceptual categories is 
significantly affected by relative amplitude. In 
utterance-initial position, however, relative 
amplitude is only a weak cue, apparently 
secondary to something else. 

To understand how thr length distinction is 
perceived in utterance-initial voiceless plosives^ 
perhaps further work should be done on the 
possible role of the shaping of the amplitude 
contour. That is, maybe a finer analysis of 
utterances and a more complicated making of 
stimuli will show, for example, that the rise-time 
of the amplitude carries more weight than the 
peak value, or that the two work together. Indeed, 
a very preliminary look at this time suggests that 
the rise time is shorter in the production of the 
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long stops. Also, it is possible that Hie major 
amplitude difference is confined to the region of 
the release burst. Other features that have not 
seemed promising so far, such as fundamental 
frequency and rate of formant transitions, may 
have to be examined more closely too. 
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Tone Splits and Voicing Shifts in Thai: 
Phonetic Plausibility* 

Arthur S. Abramsont and Donna Ericksontt 



At the time of the emergence of its daughter languages, Proto-Tai is said to have had three 
phonemic tones on ^smooth' syllables and four voicing cat«goriM for initial consonants, 
which would have been inherited by Old Thai (Siamese). Correlations between tones and 
initial consonants across the Tai languages have led to the positing of tonal splits 
conditioned by the voicing states cf initiad consonants with a subsequent shifting of voicing 
features in certain lexical classes. This change purportedly underlies the system of five 
tones and three consonantal voicing categories of modem l^ai. Thus for each tone of Old 
Thai, words with initial voiced consonants developed a lower tone and words with initial 
voiceless consonants, a higher tone. 

It has been shown for a number of languages that right after the release of a voiced stop 
consonant the fundamental frequency (Fo) of the voice is likely to be lower than after the 
release of a voiceless stop and that such Fq perturbations can influence phonemic 
judgments of voicing. This led to the designing of two experiments to test the phonetic 
plausibility of the argument: (1) CV syllables were synthesized with three values of voice 
onset time (VOT) acceptable as Thai fb p ph/. Each of these was combined with a 
continuum of Fq contours that had previously been divided perceptually into the high, mid 
and low tones. These syllables were played to native speakers of Thai for tonal 
identification. (2) Labial stops with nine values of VOT separable into /b p ph/ categories 
were coupled on synthetic mid*tone and low*tone CV syllables with upward and downward 
Fo onsets varying in extent and duration. The resulting syllables were played for iden* 
tification of the initial consonants. The historical argument receives modest supp.>rt, 
especially from the second experiment, suggesting that during a period of tone splitting, 
under the influence of audible Fo perturbations, speakers could have brought about the 
rephonemicization of the old consonant categories. Thus, these results give direct support 
to the argument that pitch factors led to voicing shifts but only indirect support to the 
claim that they gave rise to tone splits. 



INTRODUCTION 
If the distinctive tones of present-day Central 
Thai (Siamese) are the outcome of a series of 
developments over the centuries from an early 
simpler Proto-Tai tone system, or even a pristine 
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state of tonelessness, we are beset with a problem 
common to all diachronic phonology. Can the 
causes of sound change be found? For Thai, as for 
some other Asian languages, this problem is 
complicated and made even more interesting by an 
apparent intersection of changing tonal features 
and shifting voicing states of word-initial 
consonants. It is our wish here to try to shed 
phonetic light on this aspect of the history of Thai. 

In learning their language, children are likely to 
deviate ever so slightly in pronunciation habits 
from their adult models in ways that are largely 
unnoticeable at the time (Gray, 1939; Vendryes, 
1923). Insofar as these shifts are not random, they 
may accumulate gradually over the generations, 
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resulting in sound changes with phonological 
consequences. Linguists have concentrated on 
these structural alterations, describing them 
systematically and purporting to show tiistt by 
and large, they are so regular that they can be 
stated in terms of ^aws"^ for individual languages 
or langua^c^ families. Except for noting that most 
of these cha^ ^s, once they have been traced, are 
not phonetically improbable— e.g., /m/ is not likely 
to become /g/ — ^they seldom find underlying 
phonetic mechanisms that might have brought 
these changes about. 

With the recent advance of our understanding of 
the production and perception of speech, it is 
tempting for the experimental phonetician to 
believe that phonetic hypotheses on the causes of 
sound diange should be testable in the laboratory 
(Ohala^ 1974). For such research, ¥dthout any way 
to resurrect long-dead informants for a brief stint 
of field work, Hxe most that we can hope to do is to 
test the phonetic plausibility of these hypotheses 
by using presentnlay speakers. It must be stressed 
that it is only the plausibility of a posited causal 
relationship between sound change and particular 
phonetic mechanisms that can be tested. 

A number of studies on the plausibility of 
postulated phonetic mechanisms of change have 
appeared in recent years. For example, Whalen 
and Beddor (1989) have published experimental 
data compatible ¥dth an explanation of the rise of 
a nasal feature in Eastern Algonquian. As for the 
emergence of distinctive tones, Hombert, Ohala 
and Ewan (1979) have provided an excellent 
critical review of the instrumental and 
exp'^rimental work on this topic. 

The term tonogenesis, apparently first used by 
James Matisoff (1970, 1973), can mean the 
emergence of phonologically distinctive tones in a 
previously toneless language under the influence 
of certain contextual features. Another use of the 
term has been as a label for the splitting of old 
tonal categories into a larger number of tones. J. 
Marvin Brown (1975) speaks of the ''great tone 
split... that swept through China and northern 
Southeast Asia nearly a thousand years ago.* 

During the time of the emergence of its 
daughter languages, Proto-Tai is generally said to 
have had four voicing categories for initial 
consonants and three phonemic tones on ^smooth'' 
syllables, i.e., those ending in a nasal, glide, or 
long vowel, which would all have been inherited 
by Old Thai (Siamese). If we make our focus for 
the moment not the tones but the initial conso- 
nants, we find the consensus of the various 
sources (e.g., Li, 1977) to be that the voicing states 



of some of these consonants changed under the 
influence of the pitch slopes as the tones emeiged. 
We epitomize the sittiation with the labial stops: 

Protj-Tai *7b ♦b *p *ph 

Central Thai b p ph 

We see that in modem Central Thai we have /ph/ 
from two sources, as is reflected in the Thai 
writing system. The correspondences are not 
exactly the same for all Tai varieties; for example, 
in Chiangmai f*h/ > /p/. Our emphasis here, 
however, is on Central Thai. The phonetic nature 
of /*7t/ is problematic (see Erickson, 1975 for a 
discussion). Haudricourt (1956) makes the rather 
tempting suggestion of [b^l as an intermediate 
stage in the shift from /^ to /ph/. 

With help firom the writing systems, study of 
correlations between tones and initial consonants 
has led to the positing of tonal splits conditioned 
by the shifting voicing states of those consonants 
(Haudricourt, 1956; li, 1947, 1977; Maspero, 
1911). That is, ignoring iihe special problem of one 
of the four classes of consonants, the so-called 
glottalized consonants (see Erickson, 1975), we 
find that for each tonal category of Old Thai words 
with initial voiced consonants developed a lower 
tone and words with initial voiceless consonants, a 
higher tone. Thus the three Proto-Tai tones on 
fr^ooth syllables, named simply A, B. and C^ in 
the absence of knowledge of their phonetic nature, 
would have split into six. In fact, given the 
vicissitudes^ of the spread of phonological change 
over related languages, we find that Central Thai, 
which is the dialect of the Bangkok region and the 
basis of the official language of Thailand, has only 
five tones, while other regional dialects and other 
Tai languages have six or more, with differences 
among them in pitch contours as well. In a chart 
adapted from the work of Fang Kuei Li (1977, pp. 
24-33), we give an outline of the tonal shifts from 
Proto-Tai to Central Thai on page 257. 

Aside from these historical hypotheses, it 
has been known for some time that in human 
speech the fundamental frequency (FO)^ of 
a syllable beginning with a voiced consonant 
is likely to be lower, for at least part of its 
duration, than that of a syllable beginning with a 
voiceless consonant (House & Fairbanks, 1953; 
Lehiste & Peterson, 1961). Indeed, it is 
remarkable that the early historical linguists 
logically inferred this likelihood without access to 
supporting physiological and acoustic phonetic 
research! 
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PROTO-TAI 
Tone Initial 



CENTRAL THAI 
Tone Examples 



Voiceless 

A 

Voiced 


Mid or rising ptj to go f6n rain 
Mid naa rice field wan dav 


Voiceless 

B 

Voiced 




Low Uw old pUa to split 
Falling pSiSd father niu to sit 


Voiceless 

C 

Voiced 




Falling kiw nine nrna face 
High thiDs belly m4a horse 



For Thai (Erickson, 1975; Candour, 1974) and 
other languages (Hombert, 1975), it has been 
found that FO is likely to rise upon release of a 
voiced initial and fall upon release of a voiceless 
initial; both of these perturbations tend to end and 
blend in with the prosodic pattern of the syllable 
as determinud by the sentence intonation and, in 
tone languages, the lexical tone. Other studies 
(e.g., Kohler, 1982; Lofqvist, Baer, McGarr, & 
Story, 1989; Ohde, 1984; Umeda, 1981) do not 
support a clearcut dichotomy between rising and 
falling perturbations. Rather, the FO upon release 
of the voiced stop may in fact be on a level with, or 
at least not separable from, the rest of the 
contour; it may even fall a bit, or it may indeed 
rise; the crucial difference is that it is lower than 
the FO onset upon release of a voiceless stop. 

Physiological basis. As shown in literature 
reviews (Erickson, 1975; Ohala, 1978; Hombert et 
al., 1979), much ink has been spilled in support of 
various mechanisms that might underlie the FO 
differences. Varying amounts of air flow governed 
by glottal size do not last long enough after stop 
release to accoimt for the full effect. The role of 
myoelastic factors has long seemed much more 
probable. This would have to be some kind of 
difference in tension of the vocal folds. The 
problem has been to demonstrate this and tell 
what the mechanism is. One conjecture was 
vertical tension (Halle & Stevens, 1971), although 
it was haird to see how this might be executed, in 
spite of the finding of a higher position of the 
larynx for voiceless stops (Ewan & Krones, 1974). 
We are convinced by tiie recent work of Anders 
Lofqvist and his colleagues (Lofqvist et al., 1989; 
Lofqvist & McGowan, in press) that responsibility 
lies with varying degrees of contraction of the 
cricothyroid muscle used for control of vocal-fold 
tension to maintain or suppress vibration. Greater 



amounts of tension to help suppress voicing upon 
opening the glottis, combined with aerodynamic 
consequences, will cause higher FO values in the 
speech signal. 

Perception. The historical argument depends, of 
course, on the audibility of the FO differences. 
Through psychoacoustic tests, Hombert (1975) 
showed that FO movements of comparable 
mappaitude are discriminable. It has also been 
fornd that either in somewhat exaggerated form 
(Haggard, Ambler, & Callow, 1970) or within more 
or less normal ranges (Abramson & Lisker, 1985; 
Fujimura, 1971; Kohler, 1985; Silverman, 1986; 
Whalen, Abramson, Usker, & Mody, 1990) FO 
perturbations can influence judgments of voicing 
in stops in such languages as English, Japanese, 
and German. 

Goals of this study. If we assume these findings 
in production and perception to be universal and 
thus relevant to Southwestern Tai, the branch 
that gave rise to Thai, we might suppose that 
speakers of the language, already accustomed to 
the three-way tonal contrast of Proto-Tai, were 
psychologically receptive to the pitch fluctuations 
normally occurring with voicing distinctions.^ We 
might suppose that attention was gradually 
shifted from the increasingly \mstable voicing 
states of the initial consonants to the effects of the 
pitch perturbations on the following vowels. 
Increasing awareness of the perturbations could 
have led, through auditoiy feedback to production 
mechanisms, to enhancement of the effect by 
means of articulatory reinforcement and exag- 
geration of pitch differences. In this way, 
phonemidzation of the pitch fluctuations came 
about, yielding an increase in tonal categories and 
helping to keep the old lexical classes apart, while 
the consonantal voicing categories decayed, 
shifted, and even coalesced. 
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To examine the plausibility of the foregoing 
historical arguments, we carried out experiments 
on the possible perceptual interaction between 
tones and initial stop consonants in the Thai 
language of today. That is, on the assumption of 
diachronic interaction between iinitial consonants 
and tones, we tested two hypotheses on speakers 
of modem Central Thai: (1) Perturbations of 
fundamental frequency should affect the 
perception of voicing distinctions in initial stop 
consonants. (2) The voicing states of initial stop 
consonants should affect the perception of tones. It 
must be understood that for both hypotheses we 
are not saying that the factors mentioned will be 
primary for the perception of these phonological 
distinctions. Rather, support for the hypotheses 
will be obtained if the boundaries between the 
perceptual categories are significantly affected. So 
as to have incremental control over the 
dimensions of interest to us, we followed the 
common practice of using synthetic speech. 

EXPERIMENT I: VOICE ONSET TIME 
An underlying assumption in these experiments, 
borne out by earlier work, is that the voiced, 
voiceless unaspirated, and voiceless aspirated 
stops of Thai lie along a dimension of voice onset 
time (VOT), namely, the temporal relation be- 
tween the closing of the glottis for audible pulsing 
and the release of the occlusion of the initial stop 
(e.g., Lisker & Abramson, 1964; Abramson & 
Lisker, 1965; Abramson, 1989). For /b d/, voicing 
begins somewhat before the release, yielding 
*^reveicing* or Voicing lead," i.e., audible glottal 
pulsing during the occlusion. For /p t k/, voicing 
begins at the release or shortly thereafter. For /ph 
th kh/, voicing begins somewhat after the release; 
during the resulting Sroicing lag," turbulent air 
coming through the open glottis excites the supra- 
glottal vocal tract, yielding aspiration. These dif- 
ferences along the VOT dimension have not only 
been found in the acoustic sigi^als but have also 
been shown to be perceptually relevant. 

Procedure. In Experiment I we replicated the old 
work on the perceptual efficacy of VOT in Thai in 
order to establish a baseline for the testing of our 
two hypotheses. Using the Haskins Laboratories 
par&llel-resonance 83nithesizer, we made as our 
basic pattern for all stimuli a set of formant^ 
transitions appropriate to the labial place of 
articulation^ followed by three steady-state 
formants appropriate to the Thai long vowel /aa/. 
We set the voice source of the synthesizer to 
produce 37 VOT variants, ranging from 150 ms 
before the stop release tc 150 ms after the release. 



We did this in 10-msec steps except for the region 
around the release, where we used S-msec steps 
from 10 ms before the release tintil 50 ms after the 
release. Thus,, stops with VOT before the release, 
i.e., voicing lead, simulated varying amoimts of 
closure voicing. All the rest of the stimuli had a 
silent labial closure. All VOTs after the release 
had their iq)per two formants excited by a noise 
source aiad no excitation in the first formant for 
the period of voicing lag to simulate aspiration 
witli an open glottis. We made a satisfactory mid 
tonr; by means of a level FO at 120 Hz with, for 
naturalness in utterance-final position, a slight 
fall at the end (Abramson, 1962). We prepared 
eight tape-recorded randomizations of the 
synthetic stimuli with two tokens of each one in 
each of the test orders. Thus, each subject could 
have responded 16 times to each stimulus; 
however, depending on their availability, the 
listeners varied somewhat in how many tests they 
took. We pliiyed the tests through headphones to 
48 native speakers of Central Thai at 
Ramkhamhaeng University and the now defunct 
Central Institute of English Language for 
identification in Thai script as /baa/ ^teacher,' /paa/ 
*to throw,' or /phaa/ *to lead/ 

BesuUs. The results of Experiment I are given in 
Figure 1. The ordinate shows the percentage of 
responses to each stimulus as one of the voicing 
states, which are indicated by the coded lines. The 
VOT values of the stimuli are arrayed at the 
bottom along the abscissa. 



BASELINE EXPERZUEMT 




•150 -100 -50 0 50 100 150 



VOT in wmme 

Figure I. Identification of synthetic Ubial stops vaiying 
in voice onset timt. Ns440. 
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The category boundaries at the 50% crossover 
points, -7 ms for /fa/-/p/ and 26 ms for /p/-A>h/, are 
very similar to those found in earlier work 
(Abramson & Lisker, 1965; Lisker & Abramson, 
1970). Probably because of shortcomings in the 
synthesis, the /p/ category does not readi as high a 
peak as the other two. With these data in hand, 
we were ready to go on to Experiment II to test 
the first hypothesis. 

EXPERIMENT II: FO SHIFTS AND VOX 

Procedure. We then tinned to the matter of the 
effect of initial pitch perturbations on the 
identification of voicing states. We made our 
stimuli by varying the features of VOT and the 
extent of initial FO shifts in the syllable pattern of 
Experiment I. With the data from Experiment I as 
a baseline, we chose nine VOT values to span the 
three voicing categories: -100, -20, 5, 10, 15, 20, 
25, 30, and 80 ms. We imposed five FO onsets 
upon each VOT variant. In addition to a flat onset 
at the 120 Hz level of our mid tone, we also had 
two downward shifts from 130 Hz and 140 Hz, as 
well as two upward shifts from 110 Hz and 100 
Hz. Production data (Erickson, 1974) suggested 



that this 40-Hz range was reasonable. The shifts 
started at the first glottal pulse after the release 
of the stop and lasted 100 ms.^ We presented 
three randomizations of the stimuli through 
headphones to 46 of our original listeners for 
identification as /b/, /p/, or /ph/ in Thai script, as in 
the previous experiment. 

Results. The results of Experiment II are given 
in Figure 2. From top to bottom the three graphs 
show identification of tbe stimuli as /bA /p/, and 
/ph/, respectively. Along the abscissa are displayed 
VOT values, ranging from -100 ms to 80 ms. 
The ordinate shows the percentage of responses 
given to the various FO conditions for each of the 
VOT values. There is a coded line for each of the 
FO onsets. 

An analysis of variance showed a high level of 
significance for the interaction between voicing 
state and FO onset for /b/ and /p/: i^8, 360)=2.67, p 
< .008. Looking at the top graph, we see that the 
number of /b/ responses increases systematically 
as the initial FO value decreases. That is, as the 
FO value goes do¥m, more stimuli are identified 
as /b/ at a later VOT value. In the middle graph, 
we again find an effect but in reverse. 
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Figure 2. Efffctt of FO shif u on idcntificatioiu. Ns224. 
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Higher Fo onsets increase the number of /p/ 
responses and thus yield earlier perceptual 
crossovers between Ai/ and /p/. The bottom graph, 
however, shows a very tight clustering of the 
curves for /ph/ with no obvious effect of FO; this 
effect is not significant 

EXPERIMENT III: VOiaNG STATES 
AND TONE LABELS 

We turned next to our second hyi>othe8is, the 
one asserting that the voicing states of initial con- 
sonants will affect category boundaries for tones. 

Procedure. To examine this question, we used a 
fan-shaped series of FO contours with a common 
origin, which had previously (Abramson, 1978} 
been shown to be perceptually divisible into the 
three static tones, high, mid, and low. The 16 
tonal variants all started at 120 Hz and moved to 
end points ranging from 152 to 92 Hz in 4-Hz 



steps. We synthesized syllables with VOT values 
suitable for /b p ph/ and formant frequencies for 
the vowel /aa/. The syllable meant to be heard as 
/baa/ had a VOT of -100 ms, /paa/^ 0 ms, and 
/phaa/, 80 ma. The onset of each FO contour began 
with the release of the stop. For /b/, the simulated 
closure, i.e., the 100 ms of voicing lead before the 
release, was at a level FO of 100 Hz. Several 
randomized test orders were played through head 
phones to nine native speakers of Central Thai at 
The University of Massachusetts in Amherst 

Results. The subjects fully accepted the three 
VOT values as the intended voicing states. Their 
tone labels are given in Figure 3. From top to 
bottom, the three graphs show how the listeners 
labeled the FO contours as low, mid, and high 
tones respectively. The coded lines show the 
effects of the perceived voicing states of the stops 
on the tonal judgments. Along the abscissa are 
given the final FO values of the tonal variants. 



loa 





/b/ 




/p/ 




/ph/ 
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Figuft 3. Effects of voicing sUtts on font labtk. 



Tone Splits and Voicing SMfts in Thai: ?hcmHc PhaisMliiy 



261 



Each point plotted gives the percentage of 
responses to that stimulus as the tone named on 
the ordinate. 

An analysis of variance for the areas under the 
curves in the top graph shows a significant effect 
of the voicing states on the low-tone responses: 
F{2, 16M.33, p < .04. As shown by a post-hoc t- 
test, the main effect (p < .05) is that initial /b/ 
yielded a greater number of low-tone responses 
than the other two stops. We can see from the 50% 
crossover points that the final FO can be higher for 
/baa/ than for /p ph/ and still be identified as a low 
tone. 

The data plotted for the mid tone in the middle 
graph also show a significant interaction in an 
analysis of variance between voicing states and 
tone responses: F(2, 16)=8.93,p < .003. Post-hoc t- 
tests (p <.01) show that ic is /phaa/ that has a 
larger number of mid -tone responses than the 
other two syllables. 

As for the high tone in the bottom graph, again 
an analysis of variance shows a significant 
interaction: F(2, 16)=8.23, p < .004. JuBt as for the 
low tone, here too post-hoc ^-tests (p <.01) show 
that the effect comes firom the difference between 
/y and the other two stops. That is, initial /b/ 
gives a higher number of high-tone responses than 
/p/ or /ph/. Also, note the earlier 50% crossover 
point for /b/ between the mid and high tones. The 
crossover points for /p/ and /ph/, however, lie on 
top of each other. 

CONCLUSION 

It is clear from our data that fundamental- 
frequency perturbations can affect the placement 
of perceptual boundaries along the dimension of 
voice onset time. It is also true that the voicing 
states of initial stop consonants can affect the 
labeling of a continuous series of fundamental- 
fi'equency contours as tones. There are details of 
the various interactions, such as the seemingly 
paradoxically opposed boundary shifts for /baa/ 
with the low and high tones, that will require 
more thought and, perhaps, further investigation. 

By and large, then, our perceptual data seem to 
support the historical arguments concerning in- 
teractions between tone splits and voicing shifts. 
As pitch perturbations loomed larger in the con- 
sciousness of the community and gradually took 
on a distinctive function, one might suppose tliat 
the voicing states of initial consonants would have 
been reassessed perceptually and rearticulated to 
furnish new production norms. A combination of 
these factors would have brought about shifts in 
tonal and consonantal categories. 
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A Constraint on the Expressive Timing of a 
Melodic Gesture: Evidence from Performance and 

Aesthetic Judgment* 

Bruno H. Repp 



Discussions of music perfonnance often stress diversity and artistic freedom, yet there is 
general agreement that interpretation is not arbitrary and that there are standards that 
performances can be judged by. However, there have been few objective demonstrations of 
any eirtant constraints on music performance and judgment, particularly at the level of 
expressive microstructure. The present study illustrates such a constraint in one specific 
case: the expressive timing of a melodic gesture that occurs repeatedly in Robert 
Schumann's famous piano piece, "Trfiumerei.* Tone onset timing measurements in 28 
recorded performances by famous pianists suggest that the most common "temporal 
shape" of this (nominally isochronous) musical gesture is paraboHc, and that individual 
variations can be described largely by varying a single degree of freedom of the parabolic 
timing function. The aesthetic validity of this apparent constraint on local performance 
timing was investigated in a perceptual experiment. Listeners judged a variety of timing 
patterns (original parabolic, shifted parabolic, and nonparabolic) imposed on the same 
melodic gesture, produced on an electronic piano under MIDI control. The original 
parabolic patterns received the highest ratings from musically trained listeners. 
(Musically untrained listeners were unable to give consistent judgments.) The results 
support the hypothesis that there are classes of optimal temporal shapes for melodic 
gestures in music perfonnance, and that musically acculturated listeners know and expect 
these shapes. Being classes of shapes, they represent flexible conMtraints within which 
artistic freedom and individual preference can manifest themselves. 



INTRODUCnON 

Much has been written about music perfor- 
mance, with the emphasis generally being on the 
diversity among interpretations by different 
artists and in different historic periods. Yet, in 
each period (and quite likely across periods) there 
have also been generally accepted perfonnance 
standards, which were reflected in music educa- 
tion, performance practice, and music criticism. 



This reaearch was made possible throxigh the generosity of 
HaxkiDS Lebormtones (Michael St\iddert-Kennedy» president). 
Additional support came from NIH BRSG Grant RIU)5596 to 
the Laboratories. A short version of this paper was presented 
at the Second IntematiODsl Conference on Music BircepUon 
and Cognition in Los Angeles, Fehruaiy 1992. 

I am grateful to Pat Shove for many stimulating discussions. 



The nature of these standards has been discussed 
in a number of treatises (most notably Lussy> 
1882), but rarely in objective and quantitative 
terms. This is particularly true with regard to the 
expressive microstructure of perfonnance — all 
those variations that are not easily captured in 
music notation but that are essential to the com- 
municative function of interpretation. Musicians 
are usually only dimly aware of these variations, 
which they control intuitively rather than deliber- 
ately. Similarly, musical listeners perceive the 
structure and expression conveyed by these varia- 
tions without being aware of the microstructure as 
such. It has been up to experimental psychologists 
to discover and measure these variations objec- 
tively (e.g., Palmer, 1989; Repp, 1990; 
Gabrielsson, Bengtsson, & Gabrielsson, 1983; 
Seashore, 1938/67; Shaffer, 1981). 
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Even though a number of studies of expressive 
microstrueture have been published, they have 
rarely provided evidence of constraints on 
performance parameters. The principal reason is 
that they usually were based on very small 
samples of performances, so no statements could 
be made about the generality of particular 
microstructural patterns. Hypotheses about the 
generality of such patterns, as instantiated for 
example in the performance rules of Friberg 
(1991) or in the hierarchical timing model of Todd 
(1985), remain to be validated on large 
performance data bases. Moreover, studies of 
music performance have rarely combined 
measurements vntii formal perceptual evaluations 
to conHrm the aesthetic validity of the 
hypothesized or measured patterns. 

The woik of Johan Sundberg and his colleagues 
is a significant exception (see Sundberg, Friberg, 
& Fryd^n, 1991). A study by Sundberg and 
Verrillo (1980) had a purpose very similar to that 
of the present research. These authors were con- 
cerned with the temporal shape of the ritardando^ 
the gradual slowing of tempo commonly observed 
in performance at the ends of most compositions. 
They asked whether there was an optimal time 
course for this slowing down which performers ob- 
served and listeners expected. They selected 24 
recordings of rhythmically uniform music, mostly 
by J« S. Bach, and measured the onset intervals 
between successive tones, whose reciprocals they 
then plotted as local tempo decreasing over time. 
Simdberg and Verrillo found that the average 
function resulting from tiiese measurements could 
be described in terms of two linear segments, the 
second steeper in slope than the first They also 
conducted a perceptual test in which musically 
experienced listeners were presented with ex- 
cerpts that exhibited various forms of ritardando^ 
some corresponding to the observed average func- 
tion and others having deviant temporal shapes of 
various kinds. The listeners tended to prefer the 
ritardandi corresponding to the original 
performances. 

In a later discussion of the same data, Kronman 
and Sundberg (1987) abandoned the bilinear 
model and instead fitted the average data points 
with a single curve (a square-root function), which 
they claimed was similar to that observed when 
other rhythmic motor activities, such as locomo- 
tion, come to a smooth halt. (Specific references to 
relevant literature were not given.) This function 
thus may represent a rather general constraint on 
the optimal shape of the musical ritardando. 



Although these studies exhibit some method- 
ological weaknesses^ and therefore can only be re- 
garded as preliminary, they nevertheless set a 
good precedent for the kind of approach to be 
taken in investigations of performance 
constraints. 

The present investigation concerns possible 
constraints on the temporal shape of an expressive 
melodic gesture. (The performance-oriented term 
Melodic gesture*^ is used here to refer to a brief 
sequence of melody tones that is executed as a 
single expressive unit. The equivalent term 
^rhythmic) group* is often used in the musicologi- 
cal literature.) By a constraint is meant a 
restriction on the pexformance patterns that occur 
in expert interpretations and that are judged 
acceptable by mtisically experienced listeners. 
Melodic gestures occur throu^out Western music 
in most styles, and they come in a large variety of 
forms. It seems unlikely that all these forms are 
subject to any single performance constraint The 
nature of these constraints may vaiy as a function 
of many factors, including tempo, metric and 
harmonic structure, style, and so otl Rather than 
searching for a universal constraint, the present 
study focused on the timing pattern of one 
particular melodic gesture. If it could be 
demonstrated that this pattern is subject to a 
significant constraint in performance and in 
perceptual judgment, this would at least provide 
an existence proof of such constraints on 
expressive microstrueture. Moreover, by focusing 
on a specific case, the constraint can be 
characterized rather precisely. Questions about its 
origin and generality may then form the basis for 
future research. 

The melodic gesture under investigation occurs 
in Robert Schumann's famous piano piece, 
TTr&umerei" (No. 7 of ^Kinderszenen," op. 15), 
whose score is shown in Figure 1. The melodic 
gesture moves from bar 1 into bar 2. (See Figure 3 
below.) In its notated form, it consists 
of five eighth-notes ascending in pitch and a final 
longer note which repeats the pitch of the 
preceding eighth-note. The gesture recurs six 
times (eight times, if the obligatory repeat of the 
first eight bars is counted) during the piece, 
with some variations in key and interval 
structure. These recurrences are aligned vertically 
in Figure 1. The gesture is of central importance 
to the expressive quality of a performance 
of Tr&umerei* and may be assumed to be 
given close attention by both performers and 
listeners. 
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Figure L Piano tcore of Schumann's Traumtreii,'' anangtd on die page so parallel structures arc vertically aligned. 
The score was created after the Clara Schumann cdlHon (Brtltkopf h. Hartel) using MusicProM softwart; minor 
deviations from the original are due to software limitations* 



The onsets of the tones ^rresponding to the six 
melody notes define five interonset intervals 
(lOIs) which woiild be equally long if the music 
were performed mechanicr'ly (e.g., by a 
computer). In fact, they are never equal in a 
human performance; pianists always give an 
expressive temporal shape to this crucial part of 
the melody. This temporal shape can be visualized 
as the pattern of observed lOI durations, plotted 
as connected points equidistant along the x-axis 
(''score time*). How many such patterns are there? 
In principle, the melodic gesture can be performed 
with any temporal pattern whatsoever .2 However, 
the hypothesis pursued here is that only certain 
patterns actually occur in expert performances 
and are found acceptable by listeners. 

One characteristic of this class of patterns may 
be predicted on the basis of the general principle 
of final lengthening (e.g., Lindblom, 1978; Todd, 
1985): A slowing down of tempo is often observed 
at the ends of action units such as phonological 
phrases in speech or melodic gestures in music, 
particularly when they coincide with the end of a 
larger structural unit, such as a clause 
(subphrase) or phrase. Therefore, the timing 
patterns to be investigated may be expected to 
show some lengthening of the last lOI(s). An 
independent reason for lengthening of the last lOI 
might be the occurrence of two grace notes 
(essentially a written*out arpeggio) in the left 
hand during that interval (see Figure 1). However, 
these grace notes occur only in bars 2, 6, and 18, 
not in bars 10, 14, and 22. The pattern of 
execution of these two sets of variants may differ. 
Another relevant phenomenon is the possible 
lengthening of accented tones. In the score, the 
fourth note of the melodic gesture follows a bar 
line and thus in theory carries a strong metrical 
accent (downbeat). Based on the notated music, 
therefore, a lengthening of the fourth intertone 
interval might be predicted. Musical intuition 
suggests, however, that this theoretical accent is 
suspended in performance, and that the accented 
tone of the melodic gesture is in fact the final one. 
Whether this is in fact so is an empirical issue to 
be addressed below. In principle, nothing can 
prevent a pianist fi-om placing an overt accent on 
the fourth tone. 

In the remainder of this paper, a summary of 
performance measurements is followed by the 
detailed report of a perceptual experiment. The 
measurements derive from a comprehensive 
analysis of timing microstructure in performances 
of Sdiumann's Trftumerei**; for details, the reader 
is referred to Repp (1992). 



PERFORMANCE MEASUREMENTS 
Tone onset timing measurements were obtained 
from the digitized waveforms of 28 different 
performances of ^TriLumerei,* taken from 
commercial recordings (LP, CD, or cassette) by 24 
pianists. Two famous pianists (Alfred Cortot and 
Vladimir Horowitz) were represented witli three 
dififerent recordings eadi. The measurements were 
averaged over the obligatory repeat of bars 1-8 
(observed by all but two pianists in the sample) 
before further analysis. Thus there were data for 
six instances of the melodic gesture of interest in 
each of the 28 performances, a totsJ of 168. 

Initially, the geometric mean durations of the 
five lOIs for each of the six instances of the ges- 
ture were computed across the 28 performances. 
These durations (in ms) were plotted as a function 
of score time (i.e., at equal abscissa intervals), and 
their pattern was examined as to whether it could 
be fit by some simple function. These data are 
shown in Figure 2. It is evident that the timing 
pattern of each instance was fit well by a smooth 
curvilinear function, in fact a parabola (quadratic 
curve). Overall, pianists tended to speed up 
somewhat in the initial part of the melodic gesture 
and to slow down at the end. This slowing down 
was especially pronounced in the last instance of 
the melodic gesture (bars 21-22), where the score 
indicates a fermata (hold) on the last note. It was 
least pronounced in the two instances in the mid- 
dle section of the piece (bars 9-10 and 13-14). All 
instances, however, were described well by 
quadratic functions which differed mainly in cur- 
vature. 




BAR/EIGHTH-NOTE 

Figure 2. Timing psHcms of six instances of the same 
mtlodic gtsturs in ^raumcrti.'' The daU points art the 
geometric avtnige durations of 28 ptrfonnances (Repp, 
1982)^ with quadratic functions litttd to them. The 
abscissa labels rtf tr to bars 1-2. 
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Subsequently, all 168 individual timing patterns 
were plotted and examined in the same way. It 
was found that 87% of them could be described 
rather well by quadratic functions of varying 
elevation (i.e., average tempo) and curvature (i.e., 
degree of tempo modulation). All but two of the 
exceptions followed a single pattern: a relative 
shortening of the last lOI.^ This pattern, whose 
main representative was the French pianist Alfred 
Cortot in his three performances, suggests a 
different structural interpretation of the melodic 
gesture: a division into two subgestures and/or an 
intention to place an accent on the fourth tone. 
Three other pianists showed this pattern 
intermittently; Cortot himself consistently avoided 
it in bars 21-22, where he showed the standard 
parabolic timing curve. 

Further analysis of the coefficients of the 
quadratic polynomials (y = a + bx + cx2 ) fit to 87% 
of all instances revealed some strong relationships 
among the constant (a), linear (b), and quadratic 
(c) terms of these functions. The latter two, in 
particular, were highly correlated. There was also 
a siibstantial correlation between tlie quadratic 
and constant terms. Linear regressions among the 
coefficients made it possible to predict the linear 
and constant terms from the quadratic term and 
thus to generate a single family of parabolas by 
yarymg the quadratic term alone. (This family is 
shown in the upper left-hand panel of Figure 4 
below.) It captures a substantial amount of the 
variance in the data, with deviations occurring 
mainly in the constant term (i.e., elevation along 
the ordinate, corresponding to variations in 
overall tempo), which is irrelevant to the temporal 
shaping of the melodic gesture. 

These quadratic curves represent a strong 
constraint on the timing pattern of the melodic 
gesture studied here. Apparently, the large 
majority of expert pianists achieve a parabolic 
timing function by controlling a single degree of 
freedom. No pianist lengthened the second lOI, 
say, or shortened the first, or showed any pattern 
(other than the type favored by Cortot) that 
deviated substantially from a parabolic tr^^ectory 
(though see Footnote 2). Even the Cortot pattern 
followed a parabolic curve through the first four 
lOIs. To the author, however, these performances 
sound mannered. This subjective impression, in 
conjunction with the overall predominance of 
parabolic timing patterns, suggested that the 
more typical parabolic patterns might also be 
preferred by other musically experienced 
listeners. This hypothesis was tested in the 
following perceptual experiment. 



PERCEPTUAL EXPERIMENT 

The purpose of this experiment was to 
demonstrate that listeners' aesthetic preferences 
converge on the timing patterns that characterize 
the majority of expert performances. To that end, 
subjects were present^ with the melodic gesture 
of interest, executed with a variety of timing 
patterns, each of which was to be rated for 
acceptability on a 10-point scale. The timing 
patterns included parabolic and ^hybrid" 
(nonparabolic) shapes. Among the former, there 
were some that belonged to the family of functions 
observed in actual performances, whereas others 
deviated in the location of the minimum. It was 
expected that listeners would prefer the **normar 
over the deviant parabolic shapes. In addition, 
these functions varied in curvature. Since 
listeners might also exhibit a preference for a 
particular curvature (degree of tempo modulation) 
within each class of temporal shapes, some 
deviant shapes might actually be preferred over 
some normal shapes. However, for a given degree 
of curvature, the normal shapes were expected to 
be preferred most. The hybrid shapes were 
generated from two normal parabolic patterns of 
different curvature by interchanging their lOIs in 
all poesibie ways. It was expected that listeners' 
judgments would reflect the hybrids' degree of 
approximation V* a parabolic shape. The responses 
were also expocted to yield information about 
what deviationi» from the normal shapes are more 
readily tolerated than others. In fact, one of these 
deviant shapes resembled the Cortot type of 
pattern. 

The role of listeners' musical experience was a 
rather crucial issue. If the parabolic constraint 
uncovered in the performance timing 
measurements reflects a general principle of 
physical motion, i.e., an optimal pattern of 
acceleration-deceleration, then even listeners 
without much musical experience might show a 
preference for it The alternative possibility is that 
hsteners need to be atttmed to temporal patterns 
in classical music performance to show reliable 
preferences in this task. To investigate this issue, 
subjects both with and without musical experience 
were tested. A second question, concerning 
subjects with musical experience, was whether 
their judgments would be based on general 
knowledge of performance principles in classical 
music, or on specific knowledge of ^Trfiumerei* 
and its performance. This question was not 
addresfted rigorously, but some relevant 
information was obtained. 
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Methods 

Subjects. Twenty-six subjects participated. The 
majority of them had responded to an 
advertisement in the Yale campus newspaper; 
others were recruited personally by the author 
and included some friends and family members 
who served without pay. Twelve sui:Dect8 had little 
musical education; most of them did not play any 
instrument, while some had studied an 
instrument for a short time. Fourteen s\ibjectfl 
were musically experienced; they included 11 
pianists, two violinists, and one flutist, ranging in 
skill from advanced amateur to professional leveL 

Stimuli. The stimuli were generated on a 
Roland RD250S digital piano under MIDI control. 
Temporal resolution was 5 ms. Each stimulus 
consisted of the excerpt shown in Figure 3 (from 
bars 1-2 of Tr&umerei*), played with one of 45 
different timing patterns. The timing pattern was 
applied only to the melodic gesture of interest, 
which comprised five lOIs; the timing of the 
preceding context, comprising three longer lOIs, 
was constant at values representing the geometric 
means of the 28 expert performances measured by 
Repp (1992): 1065, 1380, and 1825 ms, 
respectively. The timing of the left-hand grace- 
note tones during the last lOI of the critical 
gesture was such that the first tone started after 
one third of the lOI had elapsed and ended with 
the onset of the recond tone, which started after 
one third of the remaining interval had elapsed. 
(This timing pattern was fairly common in the 28 
performances examined.) To make room for the 
grace-note tones, the preceding chord, the tied- 
over quarter-notes of the preceding chord in bar 2 



were realized as tiod-over eighth-notes. 
Sustaining pedal was added as indicated in the 
score. The tones had a fixed expressive intensity 
pattern similar to that of one of the expert 
performances. 

The timing patterns of the critical melodic 
gesture are illustrated in Figure 4. The upper left- 
hand panel shows the five formal" patterns, 
which followed parabolic functions of varying 
curvature. Each parabola was generated by the 
equation, lOI(ms) s C -f Lx -f Qx^ , where x stands 
for the ordinal numbers of the lOIs (lt...,5). The 
quadratic term (Q) of the polynomial equation was 
set at values of 20, 40, 60, 80, and 100, which span 
the range of most empirically observed timing 
functions. The linear (L) and constant (C) terms of 
the parabolas were derived according to the 
empirically determined regression equations, L = 
35 - 5.5Q and C s 388 7.8Q (see Repp, 1992). 
The resulting stimuli were named Q20, QlOO. 

The lower panels in Figure 4 illustrate two sets 
of deviant parabolic curves. Each set varied in Q 
along the same values as the normal set, but the 
constant and linear terms differed. In the 'left- 
shifted* set, C was decreased by 300 and L was 
increased by 100, whereas, in the ^^t-shifted* 
set, C was increased by 300 and L was decreased 
hy 100. In eadi case, the change in one parameter 
was arbitrary, but the change in the other 
parameter was chosen so as to keep the average 
lOI duration equal to that of the normal condition 
with the same Q. The stimuli in the left-shifted 
set (Q20L, QIOOL) started faster and ended 
slower than the normal stimuli; the opposite was 
true for the stimuli in the right-shifted set 
(Q20R, QIOOR). 




Figure 3, Tht musical txctrpt used in tht experiment (from ban 1-2 of Tiiumerti,'' wilh slightly modified final 
nottft). Tht melodic gsshirt of inltrtst is boxed in. 
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Figure 4. Tuning pallems of the expcrimenUl stimuli. Upper lefl4Mnd panel: noimal parabolic patterns. Lower left- 
hand panel: left-chifted parabolic paHeins. Lower right-hand panel: tight-shifted parabolic patterns. Upper right-hand 
panel: hybrid patterns. 
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The remaining 30 timing patterns were gener- 
ated as illustrated in the upper rightrhand panel 
of Figure 4. The heavy lines in the figure illuBirate 
the Q20 and QlOO timing patterns, represented 
here as polygons rather than as smooth curves. 
Thirty hybrid patterns were generated by inter- 
dianging lOI durations from those two patterns. 
With two possible values for each of five lOIs, 
there are 32 possible patterns, two of which are 
the original ones. The original patterns were 
coded arbitrarily as 900000 Q20) and Hlllll 
(s Q100)» and hybrid patterns were coded as 
HIOOOO, HllOOO, etc Clearly, some of these 
h3i>rids (e,g., HOOlOO, HllOll) were very similar 
to the originals, whereas others were more 
dissimilar. Although some of them were clearly 
nonparabolic (e.g., HOlOlO), others might by fit by 
a left-shifted or right-shifted parabola (HOOOll 
and HlllOO, respectively). In contrast to the left- 
and right-shifted parabolic patterns, however, all 
individual lOIs in the hybrid patterns were within 
the normal range. One hybrid pattern, HI 11 10, 
was not unlike the Cortot pattern described above. 

The stimuli were recorded electronically from 
the audio output jack of the digital piano onto 
high-quality cassette tape . Six examples were 
recorded at the beginning of the tape, the first 
three with isochronous timing of the melodic ges- 
ture (i.e., with constant lOIs of 500 ms), and the 
second three being stimub HOllOl, Q80R, and 
Q40. These examples were followed by three dif- 
ferent randomized sequences of the 45 stimuli. 
Interstimulus intervals were 5 s, with an addi- 
tional 5 s aft;er each group of 15, and another 5 s 
between blocks. 

Procedure. Subjects received a dubbed copy of 
the master cassette, accompanied by detailed 
printed instructions, an answer sheet, and a 
questionnaire about their musical experience. 
They listened on their home audio equipment and 
; ^turned the completed materials. (Control over 
sound quality and playback level was not crrcial 
in this study.) 

The instructions displayed the score of he 
excerpt (cf. Figure 3) and included the following 
crucial sections: 

...Each time the exccq)t will be played with a 
slightly different timing pattein of the notes. Your 
task is to judge the aesthetic appeal of each timing 
pattern. Clearly, there are no right or wrong 
responses here; I want to find out what sounds 
goodtoyoK.-.** 

(After the first set of examples had been 
introduced:) 



"^...In the fc^owing three examples, the eighth- 
notes vary in duration, as tbey would in a human 
performance. Each of the three examples has a 
different timing pattern, and tbey may not (in fact, 
should not) sound equally good to you. Clearly, 
there m some timing patterns that are pcef erabie to 
others. ... In the following test, you will indicate 
[your] piefercnce by giving a numerical rating 
between 1 md 10 to each excerpt you bear, where 
10 is the best possible rating and 1 is the worst ... 
However, don*t use these [ratings] in an absolute 
sense, but try to adjust to the diversity of timing 
patterns you hear and use the whole scale; that is, 
give ratings (rf 9 or 10 to the best patterns you hear 
in the course of this experiment, and ratings of 1 or 
2 to the worst, regardless of bow you might judge 
these patterns in an absolute sense. Avoid giving 
too many ratings in the middle range; try to use the 
extremes as well....** 

Nearly all subjects in fact used the whole range 
of rating categories. 

Results and Discission 

Consistency of judgments. The first question to 
ask was whether the subtjects were able to perform 
the task — that is, give reliable judgments. The 
reliability of their ratings could be determined by 
correlating the ratings across the three blocks of 
stimuli. Since the first block served to familiarize 
subjects with the stimuli, the correlation between 
the second and third blocks was expected to be 
higher than that between the first block and 
eitiier of the other two. However, although this 
was true for some individual subjects, there was 
no such overall tendency in the data, and the 
three interblock correlations were therefore 
averaged for each subject. 

All 14 musically experienced subjects exhibited 
significant average correlations, ranging from 0.31 
(p < .05) to 0.79 (p < .0001). Of the 12 musically 
inexperienced subjects, however, only two showed 
a significant average correlation (0.49 and 0.51, 
respectively, both p < .001); for the rest, the 
correlations ranged from -0.01 to 0.18. This is a 
very striking difference. Most musically untrained 
subjects apparently did not possess a stable 
criterion by which to judge the stimuli. 

A second criterion that separated the two 
subjects groups was their response to timing 
pattern Q20L. As was evident to the author 
during stimulus generation, this pattern (as well 
as Q40L) sounded really ridiculous, in contrast to 
the other patterns, which teemed at least 
moderately acceptable. Indeed, all musically 
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experienced subjects assigned their lowest ratings 
to Q20L, with average ratings ranging from 1.0 to 
2.0. Ten of the 12 musically inexperienced 
subjects^ however, gave this stimulus average 
ratings between 5.0 and 9.33! The remaining two 
subjects gave average ratings of 2.67 and 3.0, 
respectively; however, they were not the two 
individuals who showed significant reliability of 
judgments. 

Because of this striking dichotomy in 
judgmental criteria and consistency between the 
two subject groups, further analysis was restricted 
to the data of the 14 musically experienced 
subjects. Their responses to the parabolic and 
hybrid patterns were analyzed separately. 

Parabolic patterns. The parabolic patterns 
constituted a 3 (Type) by 5 (Curvature) design. 
The subjects' ratings were averaged over the three 
blocks and subjected to a two-way repeated- 
measures ANOVA. The average ratings are 
plotted in Figure 5. 




40 60 80 

CURVATURE (Q) 



Figures. Avciagc ratings given to the parabolic patterns. 

As is evident from the figure, the prediction that 
the normal parabolic curves would receive the 
highest ratings was confirmed. The main effect of 
Type was highly significant tF(2,26) = 65.60, p < 
0.00011. There was blso a significant main effect of 
Curvature tF(4,52) - 5.50, p < 0.0011, though it 



was irrelevant in view of a strong two-way 
interaction [F{8,104) = 20.95, p < 0.0001]. This 
interaction was evidently due to the vety different 
effect of Curvature for left-shifted parabolas than 
for normal and right-ahiffced ones. 

The latter two stimulus types were analy^ied in 
a separate ANOVA There were significant effects 
of Type [F(l,13) = 40.24, p < 0.0001] and of 
Curvature [F(4,52) = 11.26, p < 0.0001], but a 
nonsignificant interdiction [F{4,52) = 1.44, p ~ 
0.241. Normal parabolas were rated more highly 
than rii^t-thified ones at all degrees of curvature, 
and for both types the most preferred curvature 
was Q40. By contrast, left-shifted parabolas were 
judged extremely unfavorably at low curvatures 
(as noted earlier) and more favorably at high 
curvatures. At QlOO, left-shifted functions were 
almost as acceptable as normal ones, and more 
acceptable than right-shifted ones. This indicates 
that the subjects were particularly averse to 
hearing a short first lOI; for stimuli Q60L to 
QIOOL, the reduction of the starting tempo 
apparently compensated for the exaggeration of 
the final slow-down. 

Hybrid patterns. The 30 hybrid patterns, 
together with tlie parent patterns Q20 and QlOO, 
formed a2x2x2x2x2 design: Each of the five 
lOIs could either have a short duration (from Q20) 
or a long duration (from QlOO). These five 
positions will l>e referred to by the letters A, B, C, 
D, E in the following. A 5-way repeated-measures 
ANOVA was conducted on the subjects' ratings. 
Significant main effects in this analysis would 
indicate that the listeners preferred a shorter or 
longer lOI duration in particular positions. Such 
effects were more likely in the positions where 
Q20 and QlOO differed most, i.e., E and A (cf. 
Figure 4). Of greater interest were any 
interactions among the five position factors, which 
would indicate that the relationships among 
several lOIs mattered. The average ratings are 
rhown in Table 1. 

Only one of the five main effects reached signifi- 
cance, that of position A [F(l,13) = 8.10, p < 0.02]: 
Listeners preferred the shorter lOI in that posi- 
tion (see Table 1, bottom row).^ The main effect 
for the last position, E, was nonsignificant (F(l,13) 
1.06, p < 0.32], evon though the change in 
duration was larger (424 ms vs. 264 ms). This is 
interesting in view of the ''Cortot pattern" 
mentioned earlier, in which the last lOI is 
abnormally shortened; apparently, the present 
listeners were not very consistent in their 
responses to different degrees of final 
lengthening.^ 
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Code 


Rating 


Code 


Rating 


Code 


Rating 


Code 


Rating 


HOOOGO 


7.4 


HOIOOO 


65 


HIOOOO 


5.4 


HI 1000 


6.9 


HUUUUl 


/.D 


nUiUUi 


o.^ 




A 0 


ni iuui 




HOOOlO 


6.8 


HOlOlO 


6.1 


HlOOlO 


52 


HI 1010 


5.6 


HOOOll 


6.0 


HOlOll 


6.1 


HlOOll 


4.6 


HI 1011 


5.3 


HOOlOO 


• 7.1 


HOI 100 


6.7 


HlOlOO 


5.7 


HI 1100 


5.5 


HOOlOl 


7.0 


KOI 101 


6.1 


HlOlOl 


53 


HlllOl 


6.0 


HOOllO 


6.8 


HOI 110 


6.6 


HIOUO 


4.7 


HllllO 


5.7 


HOOlll 


6.8 


HOllll 


6.8 


HlOlll 


4.2 


HUlll 


5.4 


KOO... 


6.9 


HOI... 


6.4 


HIO... 


5.0 


Hll... 


5.7 


HO... 


6.7 


HI... 


5.4 











There were several significant interactions, 
ho waver y which indicated that listeners did not 
judge lOI durations individually. The three 
largest interactions were AB [F(l,13) = 16.59, p < 
0.002], ABCD [F(l,13) = 16.15, p < 0.002], and CE 
[F(l,13) = 12.00, p < 0.0051. Four additional 
interactions, ACD, BCD, BD, and BDE, were 
significant at p < 0.05, and three further 
interactions, ACDE, AECE, and BCDE, were 
nearly significant (p < 0.06). It is perhaps 
noteworthy that the only two positions that were 
never involved together in a signiHcant 
interaction are A and E. The beginning and the 
end of the timing pattern thus seemed to be 
judged independently. 

These interactions indicate that it is the pattern 
of lOIs that mattered, not individual lOI 
durations. The AB interaction, for example, shows 
that a shorter second lOI (B) was preferred when 
the first lOI (A) was short, but a longer B was 
preferred when A was long (see Table 1, 
penultimate row). The BD and CE interactions 
show a similar pattern of preferred positive 
covariation between two positions. Now consider a 
more complex interaction, ABCD, which subsumes 
two other significant interactions, ACD and BCD. 
It can be viewed as four CD interactions, one for 
each of the four combinations of A and B values. 
Three of these four two-way interactions exhibit 
the positive covariation described above, but one 
(that for long A and short B) exhibits negative 
covariation. That is, in that specific condition 
listeners preferred a long C when D was short, 
and a short C when D was long. The reason for 
this complex interaction is not obvious, but it is 
remarkable that position C, which had a duration 
difference of only 24 ms, was so strongly involved 
in it. Listeners' sensitivity to small deviations in 



that position mirrors the restricted range of 
observed lOI durations. 

Another prediction may be examined in the 
hybrid pattern data. The parent patterns, Q20 
and QlOO, w( e parabolas of the normal type. The 
hybrid patterns approached parabolas in various 
degrees. Therefore, none of them should have been 
rated higher than the parent patterns, whereas 
quite a few should have been rated lower. 
However, the difference in average ratings 
between Q20 and QlOO of about 2 points (cf. 
Figure 5) must be taken into account. The revised 
prediction, therefore, is that no hybrid pattern 
should have been judged more acceptable than 
Q20, but some should have been judged less 
acceptable than QlOO. 

The first part of the prediction was confirmed: 
Only ope hybrid stimulus, HOOOOl (i.e., Q20 with 
a lengthened final lOI), received a higher average 
rating than Q20 (7.6 vs. 7.4), but this difference 
was certainly not significant. The second part of 
the prediction was also supported: There were 7 
hybrid stin\uli that received lower average ratings 
than QlOO (5.4). The lowest rated stimulus, 
HlOlll (4.2), corresponded to QlOO with a 
shortened second lOI. In fact, Table 1 shows that 
all stimuli of the type HIO... received low ratings 
whose range (4.2 to 5.7) did not overlap at all with 
the ratings of HOO . . . and HOI . . . stimuli (range: 
6.1 to 7.6); Hll . . . stimuli were in between 
(range: 5.1 to 6.9). This again confirms the relative 
importance of the first lOI in relation to the 
second. Clearly, listeners did not like a relatively 
long first lOI. Thi% is easily explained by the tonal 
structure: The first tone of the melodic gesture, E, 
is a half-step below the tonic (i.e., a "dissonant 
lower neighbor*) and, moreover, in a metrically 
weak position, whidi calls for a quick resolution. 
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It might also be asked whether the ratings of 
the hybrid stimuli reflected the degree to which 
they approached a parabolic timing curve. As the 
results for parabolic patterns show, however, what 
matters is not so much the parabolic shape itself 
as its parameters. That deviations from a 
parabolic shape can be tolerated is illustrated by 
the ratings for hybrid stimulus HllllO (5.7), 
which were slightly higher than those for the 
perfectly parabolic stimulus Hlllll, alias QlOO 
(5.4). HllllO resembles the ^'Cortot pattern,* and 
Cortot may have taken advantage of listeners' 
tolerance for variations in the final lOI. The 
resulting timing shape is only moderately 
acceptable, however, which matches the author's 
impression from listening to Cortot's recordings 
(whatever other qualities they may have). 

GENERAL DISCUSSION 
The present results are limited in a number of 
ways, which will be discussed below. Within these 
limitations, however, they provide a clear indica- 
tion of a constraint on performance timing that is 
shared by expert performers and musically accul- 
turated listeners. While it may be perfectly obvi- 
ous to some theorists that such constraints must 
exist, their objective demonstration and character- 
ization has rarely been undertaken before. The lo- 
cal constraint examined here is flexible enough to 
permit a large variety of concrete timing patterns; 
yet there is reason to believe that, in a specific 
musical context, a single pattern may be optimal. 
Because of the contextual timing variation inher- 
ent in different performances, the evidence for op- 
timality comes from the perceptual data alone. 
For the specific musical excerpt presented here, 
the timing shape labeled Q40 seemed to be best, 
on the average. 

It is necessary to discuss now what possible 
generality this finding may have. Three major is- 
sues concern individual differences in preferences 
and experience, the specific stimulus conditions of 
the experiment, and the specific musical excerpt 
selected. 

Individual differences among listeners did exist, 
of course, as they do in nearly all psychological 
studies. However, the high levels of significance of 
some of the effects obtained suggest considerable 
agreement. More extensive replications of 
judgments per subject would be needed to 
interpret individual differences. A few 
observations are offered here: All subjects but one 
gave some of their highest ratings to parabolic 
patterns of the normal type; the exception was a 
professional pianist who gave her highest ratings 



to stimuli HllllO (the Cortot-like pattern), 
HI 10 10, and Q40R, which all shared an initial 
accelemndo but had a reduced ritardando at the 
end. Among the normal parabolic patterns, most 
subjects' preference fell on patterns with lower 
curvature (Q20, Q40, or Q60), though two 
subjects, both accomplished pianists* preferred 
those with higher curvature. One subject, 
interestingly the youngest in the group (an 11- 
year old girl who studies the piano), did not 
differentiate much among the different degrees of 
curvature, though she clearly preferred the 
normal patterns over the left- and right-shifted 
ones. How the internal standards by which such 
patterns are judged are acquired in the course of 
music education is of course a very interesting 
question for future research. 

That musical experience is a sine Qua non for 
reliable performance in the experimental task was 
demonstrated convincingly here. The precise 
nature of the necessary experience is less clear, 
however. The subject sample did not include 
individuals who cannot play an instrument but 
Usten extensively to classical music; the musically 
experienced subjects were all instrumentalists of 
varying degrees of profrciency. The several 
professional pianists in the group, who surely had 
the most extensive musical education, actually 
were not the most reliable judges. It is entirely 
possible that professional musicians' criteria are 
less fixed than those of amateurs and ordinary 
music lovers, because constant interaction with 
other musicians as students, ensemble players^ 
and teachers may encourage tolerance of a large 
variety of interpretative nuances. 

It seems unlikely that specific knowledge of 
Traumerei'^ and exposure to performances of this 
music in the past played a significant role in sub- 
jects' judgments. Most subjects were very familiar 
¥rith the piece, but some were not. One subject, a 
flutist, indicated that she did not know it at all; 
two others, who are string players, and the II- 
year-old pianist indicated they were "fairly** famil- 
iar with it. Yet, these subjects gave reliable judg- 
ments consistent with those of the other musi- 
cians. The more iirportant argument is one of 
plausibility: Although some surface characteristics 
of previously heard performances may well be part 
of the memories of familiar pieces of music, per- 
formance rules must also be stored in a more ab- 
stract form, so as to be applicable to music never 
heard before. Musically experienced listeners 
surely can judge the performance quality of novel 
music in a familiar style^ just as performers can 
sightread new music with good expression, again 
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provided the style is familiar (as it is in the case of 
any piece from the Romantic period). 

Subject variables thus do not seem to impose a 
serious limitation on the generality of the present 
results. Stimulus variables are more of a problem. 
There are at least three factors that may influence 
subjects' judgments but were kept constant in the 
e]q>eriment. One is the melodic and harmonic con- 
tent of the musical excerpt As pointed out in the 
section on performance measurements, the 
melodic gesture under examination occurs six 
times in Traumerei,* and only two of these in- 
stances are exactly identical. As Figure 2 showed, 
the average timing curve of the excerpt used in 
the rating task (which occurs in bars 1-2 and 17- 
18) has only moderate curvature, comparable to 
the Q40 stim\ilus. A lower curvature was t3npical 
of the variants in the middle section of the piece, 
while high curvatures were mainly associated 
with the last instance preceding the fermata. Thus 
the listeners indeed preferred the curvature ap- 
propriate to the excerpt offered, but they may well 
prefer a different curvature for other variants. 
However, the preference for normal parabolic 
shapes should hold across all variants. 

A second factor is the timing (and the implied 
tempo) of the context in which the critical melodic 
gesture was presented. The lOIs of the preceding 
musical events were set somewhat arbitrarily at 
the geometric means of the performance sample. 
It is possible, even likely, that a different (hoice of 
lOIs for the context would have influenced 
subjects' preferences. For example, if the lOIs had 
been longer (implying a slower tempo), listeners 
may have opted for a more curved or elevated 
timing function. This would be interesting to test 
in future experiments. As it was, however, 
listeners were presented with an average 
contextual timing pattern, and they preferred a 
curvature that also corresponded to the average, 
which seems appropriate. Their general 
preference for normal parabolas should be 
independent of variation in contextual timing. 

A third factor is the intensity microstructure of 
the melodic gesture, which was also held fixed. It 
was derived from an individual performance, and 
its contour may not have been close to the 
average.^ It did constitute a crescendo, however, 
as marked in the score. Very little is known at this 
time about the perceptual interdependence, if any, 
of timing and intensity microstructure. It is 
conceivable that a different intensity contour 
would change subjects' curvature preferences. 
Again, however, there is no reason to believe that 
the subjects would prefer atypical timing patterns. 



as long as the intensity microstructure stayed 
within the normal range of variation. 

A final conaideration is the selection of timing 
functions presented in the experiment. Clearly, 
there are many possible shapes that were not 
included, mainly because they were expected to 
sound terrible and might have offended musical 
listeners' sensibilities. This is not a serious 
omission. On the other hand, it is conceivable that 
there are timizig curves superior to Q40 in this 
particular context. The left- and right-shifted 
parabolas constituted fairly gross deviations, and 
there are other functions closer to the normal ones 
that, in a sufficiently sensitive perceptual test, 
might prove even more highly acceptable. It must 
also be noted that implicit tempo (which is 
difficult to quantify in a temporally modulated 
performance; see Repp, 1992) was confoimded 
with curvature to some extent, QlOO having a 
slower tempo than Q20. Listeners' overall 
preference for Q40 may have constituted a 
preference for (contextually appropriate) tempo as 
much as for curvature. This would have to be 
sorted out by varying the constant and quadratic 
parameters of the timing curves independently. 

In simmiary, consideration of various stimulus- 
related factors suggests that listeners' preference 
for a particular curvature of the timing function 
may well be context-dependent; however, their 
general preference for normal parabolic shapes 
most likely is not. It should also be remembered 
that the normal family of parabolic shapes was 
derived from a set of performances that varied 
widely in the performance parameters (tempo, 
contextual timing, intensity microstructure) 
whose possible role in perception was just 
considered. The generality of the parabolic 
constraint across this performance variation 
should have a parallel in perceptual preference 
across similar variation. 

This leads to the broader question concerning 
the generality of the parabolic constraint to other 
kinds of melodic gestures and musical styles. One 
obvious limitation is that the constraint can 
meaningfully apply only to melodic gestures that 
have at least four lOIs. The more lOIs, the 
stronger the constraint may manifest itself. Repp 
(1992) examined the timing patterns of three 
other melodic gestures in '^ftumerei,* each 
comprising 4 lOIs near the end of a phrase; they, 
too, seemed to follow the constraint, but somewhat 
less consistently than the 5-I0I gesture examined 
here. Gestures with less that 4 lOIs, of course, 
cannot violate the parabolic constraint; they are 
simply irrelevant to it 
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Another limitation is that the gestures may 
need to have a ritardando in them. This was true 
for all the instances examined by Repp (1992). 
Moreover, the present results are in strong 
agreement with the performance and perception 
results of Sundberg and Verrillo (1980), who 
focused on final ritardandi in Baroque music. The 
parabolic constraint thus may characterize 
ritardandi at all levels of the grouping structure, 
and quite possibly across different musical styles. 
It may indeed represent a **naturar way of 
changing tempo, including both accelerando and 
ritardando^ though the evidence for accelerando is 
limited to the initial part of the melodic gesture 
examined here. 

These tempo changes, moreover, must be 
uninterrupted. This perhaps constitutes the most 
serious limitation of the parabolic constraint. It 
may only apply to gestures that are rhythmically 
uniform and do not contain tones that receive 
special emphasis for harmonic or melodic reasons. 
If so, it characterizes only a small minority of the 
melodic gestures in a musical piece, though they 
may be the most salient ones, whidi mark the 
ends of major sections. This minority, however, 
turns into a majority if all short melodic gestures 
in which the constraint applies trivially are 
included. It is noteworthy that Todd (1992), in the 
process of extending his coarse-grained model of 
expressive timing (Todd, 1985) to detailed local 
timing patterns, has been assuming a linear 
velocity function of tempo change for melodic 
gestures (^'segments") of any length, apparently 
with good success. A linear velocity change is 
equivalent to a quadratic timing function for the 
raw lOIs during a unidirectional tempo change. 
Previously, Todd (1985, 1989) presented data 
suggesting that the global timing shapes of whole 
phrases can be modelled by a family of parabolic 
functions. His current, somewhat modified 
conception promises io contitute a valid basis of a 
general performance model. 

The pambolic fimctions used in Repp (1992) and 
in the present study were empirically derived and 
may eventually have to give way to similar but 
theoretically motivated functions such as proposed 
by Todd (1992), provided that they fit the data 
equally well. The extramusical origin of 
constraints on performance taming is still a matter 
of speculation, but it is likely to lie in aspects of 
physical movement that have invaded musical 
performance and ultimately account for the 
frequent allusions to ''musical motion" in the 
musicological literature. Although musical motion 
is often attributed to tonal sequences without 



explicitly appealing to performance, it seems 
likely that music needs to be se^ into motion by a 
performer, real or imaginary. Once the physical 
movement has entered the music, it will in turn be 
able to *move* a listener, provided it has the 
properties that the sensitive listener is attimed to. 
The kinds of melodic gestures that are most 
**moving* in a good performance are probably 
those that give the timing constraint a chance to 
emerge clearly and impress itself on the listener. 
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FOOTNOTES 

*Mustc Percep t io n, in press. 

^These weaknesses include preselection of performances with 
''typical" riiardendi, averaging across heterogeneous musical 
materials, a rather unbalanced and poorly described design in 
the pcrccptkm experiment, and grtat variability in Usicners' 
iud^nenks. 
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^That is, as long as ttie pattern docs not lead muscally literate 
listeners to conclude that a thyteiic misUkt has been made. In 
CFthcr words, the pofonnance timing pettem must be ccfnpetible 
with a^e notated temporal pattern. There is, of course, a grey 
area here in which fdttiluinaas to score may be a matter of 
opinion. 

3The remaining two exceptions, both occurring in the perfor- 
mance of Brazilian pianist Cristins Ortiz^ exhibited a rdative 
kngthcning of the third lOi instead Ci^, a W-shaped pattern). 

^Ihis seems to contradict the earlier observation that subjects 
disliked a short first ICX in left-shifted pattema. Note, however, 
that ^ first lOI of atimulus Q20 correspondad to that of 



stimulus Q60L (cf. Figure 4). Ihus, while listeners disliked 
abnormally short first lOIs, within the normal range they 
preferred short over kxig first ICKs. 

^It should rK>t be infemd tet the subjects were unable to detect 
the di£Fcranca in duration of the final lOi (424 ms). The present 
task was one of aaathetk, not s«sisary disaimxnation. 

^The mettwds of deriving arKi transferring the intensity values 
will rK>t be defended here, as they are still in need of validation. 
Suffice it to say that the dynamic variation sounded appropriate 
to ttie author. An analysis of the intensity microetructure of the 
entire sample of 28 'Trftumcrei'' p e tibrm ancas remains to be 
conducted. 
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A Review of Carol Krumhansrs 
Cognitive Foundations of Musical Pitch^ 

Bruiu> H. Repp 



The psychology of music perception and 
cognition, nearly dormant 15 years ago, has made 
considerable strides in the last decade. Several 
textbooks and edited collections of articles have 
appeared, two new journals have been 
established, societies have been formed, and many 
reports of empirical studies have been published, 
including one in monograph form (Serafine, 1988). 
However, the field is still new and small,^ and few 
researchers have had the persistence and the good 
fortime of continuous grant support to develop and 
bring to fruition an extended and coherent 
program of research. Carol Krumhansl, of Cornell 
University, has accomplished that feat, and her 
monograph documents a decade of individual 
achievement, resulting in a critical mass of 
psychological data organized in a tight conceptual 
framework. This publication is a landmark event 
in a young field striving for definition and 
recognition. 

The brilliance of Krumhansrs approach was 
recognized early on by her peers who bestowed on 
her the American Psychological Association's 
Distinguished Scientific Award for an Early 
Career Contribution in Psychology (see American 
Psychologist, March 1984. pp. 284-286). The 
citation honored her for ''a dazzling interplay of 
experimental techniques, music theory, and 
multidimensional scaling" that had uncovered 
^'new cognitive structures of great richness and 
beauty* (ibid., p. 284). This methodological 
virtuosity as well as its satisfying results are 
evident throughout the book. Although nearly all 
the results have been published previously in 
accessible journals, they are brought togetiier here 
for the benefit of the reader, who is led through 
the complex issues by lucid explanations and 
discussions. The clear organization and sense of 
direction make reading the book an aesthetic as 
well as an intellectual pleasure. 



The book is divided into 11 chapters. Chapter 1 
introduces the reader to the author^s objectives 
and methods. Some general remarks about the 
approach of cognitive psychology are provided for 
nonpsychologists. The general aim is %d describe 
the human capacity for internalizing the 
structured sound materials of music by 
characterizing the nature of internal processes 
and representations* (p. 6). The more specific aim 
of Krumhansrs research is ^ describe what the 
listener knows about pitch relationships [mainly 
in traditional Western music], how this knowledge 
affects the processing of sounded sequences, and 
how this system arises from stylistic regularities 
identifiable ir, the music* (p. 9). Krumhansrs 
distinctive way of characterizing internal 
representations is to depict the similarity 
relationships within sets of basic elements as 
distances in a multidimensional space. 

The basic elements are said to be single tones, 
chords, and keys. (That keys are rather more 
abstract entities than are tones and chords is not 
immediately pointed out, but perhaps obvious.) 
Krumhansl does not further defend this axiom, 
which would be accepted by most music 
psychologists and musicologists. Witness, 
however, what Serafine (1988) — ^not cited in the 
book— had to say: "^n this view, the elements and 
processes of cognition will be exactly isomorphic to 
the factors we are able to find ... and manipulate 
in experiments* (p. 26) and ^we know that the 
stimuli used in such studies are never, under any 
circumstances, considered or listened to as music* 
(p. 25). And, fiirther along, Serafine argued that 
^uch psychological research has mistakenly 
focused exclusively [on], and also misinterpreted, 
merely the results of reflection — that is, scales, 
chords, and discrete pitches — ^rather than been 
concerned with music itseir (p. 52). I will return 
to these arguments at the end of this review. 
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Chapter 2 introduces the reader to the concept 
of tonal hierarchy, and to KrumhansFg way of de- 
riving and depicting this c<>gnitive structure. The 
term *hierardiy* here refers to a simple ordering 
of tones according to their relative importance or 
stability vrithin a given key, not to a structure 
with several nested levels (as in Lerdahl and 
JackendofT, 1983). The hierarchy of the tones 
within a givdn k^ is likened to the organization of 
category members around a proto^ical exemplar, 
in this case the base note or tonic, which serves as 
a cognitive point of reference. Krumhansrs 
experimental probe tone method (developed in 
collaboration with Roger Shepard) presents 
listeners with a sequence of notes that 
unambiguously define a particular key (e.g., a 
scale or a tonic triad chord), followed by a single 
note of variable pitdi. Subjects judge on a rating 
scale how well this final probe tone fits into the 
context of the established key. The resulting pat- 
tern of average ratings across all tones of the 
chromatic scale describes the tonal hierarchy: The 
tonic (scale step 1) is rated highest, followed in 
migor keys by scale step 5, steps 3 and 4, steps 2 
and 7, and finally the chrontatic tones that are not 
members of the key. In minor keys, the order is 1, 
3, 5, then steps 2, 4, and 7, and finally the chro- 
matic tones. These hierarchies correspond to the 
functional importance of the scale notes in tradi- 
tional tonal music, as described by musicologists. 

By computing the auto- and cross-correlations 
between the rating profiles for all possible pairs of 
major and min^^ keys, Krumhansl derives a 
matrix of interkey similarities that she then 
subjects to nonmetric multidimensional scaling to 
obtain a spatial configuration of interkey 
distances. The configuration is strikingly regular, 
due to the constraints built into the data, and it 
also makes sense: Two dimensions in which the 
points representing the keys are arranged 
according to the drcle of fifths are convolved with 
two dimensions in which the keys are arranged in 
a circle that reflects relative and parallel 
relationships between m^or and minor keys. The 
total four-dimensional pattern can be visualized 
as the surface of a torus (a doughnut), or the 
surface can be spread out in two dimensions 
representing the angular coordinates of the keys 
in the two circular configurations. This latter, two- 
dimensional key map resembles maps dra¥m 
intuitively \>y musicologists: The key of C miuor, 
for example, is adjacent to the m^jor keys 
differing in one note (G and F m^or), to the 
relative minor key (a minor), and to the parallel 
minor key (c minor). Thus it provides an empirical 



validation of musicologists' insights through 
listeners' probe tone ratings.^ Krumhansl adds a 
cautionary note by pointing out that her model 
does account for possible directional 

asymmetries in key similarity. 

Following this methodological tour de force, the 
author turns in Chapter 3 to a discussion of the 
factors that may underlie listeners' knowledge of 
tonal hierarchies. She considers two: the phe- 
nomenon of tonal consonance, and the statistical 
distribution of pitches in tonal music. Strong cor- 
relations of tonal hierarchies with consonance hi- 
erarchies would siiggest that tonal hierarchies 
originate in the acoustics of complex tones and 
therefore are relatively fixed and universal. 
Stronger correlations with the distribution of 
tones in familiar musir;, on the other hand, would 
suggest that tonal hierarchies are learned and 
culture-bound. Krumhansl briefly reviews acousti- 
cally-based theories of tonal consonance and then 
proceeds to describe the correlations between her 
tonal hierarchies and consonance hierarchies 
culled from various studies in the literature.^ The 
correlations are moderately high for m«gor keys 
but lower and mostly nonsignificant for minor 
keys. Krumhansl then proceeds to compare the 
tonal hierarchies with the statistical frequency 
diatributions of tones in various selections of tonal 
music, again obtained from the literature. These 
correlations are much higher and significant for 
both migor and minor keys. Finally, a multiple re- 
gression analysis is performed which demon- 
strates that tonal consonance does not account for 
any aspect of tonal hierarchies that is not also ac- 
counted for by tonal frequency distributions. On 
the basis of these results, Krumhansl aigues that 
tonal hierarchies are learned throuigh listening to 
tonal music and hence are a product of musical 
acculturation. Research by Lynch et al. (1990a, 
1990b>--too recent to be cited by Krumhansl — ^in- 
deed suggests that this acculturation begins in the 
firstyear of life. 

In Chapter 4, Krumhansl turns to a practical 
application of her tonal hierarchy results: 
determination of the key for a musical excerpt, 
and of changes in key as music progresses. Her 
key-finding algorithm (developed with Mark 
Schmuckler) is simple: The total duration of each 
note in the musical excerpt is determined by 
combining repeated occurrences of the same note, 
regardless of octave or ordinal position. The 
resulting relative durations of the 12 notes in the 
octave (with zero for notes that do not occur) are 
then correlated with the tonal hierarchy profiles 
for the 24 m^jor and minor keys, as described in 
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Chapter 2. The largest correlation identifies the 
dominant key. Other large correlations identify 
related keys that may also be suggested by the 
musical passage. Indeed, the migor virtue of the 
algorithm is seen in its ability to yield a key 
hierarchy, raflier than just a single dominant key. 
As Krumhansl demonstrates, music theory 
experts can rate the relative strengths of 
candidate keys for short musical excerpts. 

The effectiveness of the key-finding algorithm is 
demonstrated in three specific applications and is 
compared to other procedures prouosed in the lit- 
erature. In the first application, ihe algorithm jS 
used to determine the nominal keys of preludes 
(24 each) by Bach, Shostakovich, and Chvpin, 
based on the first four notes only. The result / are 
quite accurate for Badi and Shostakovich, but less 
80 for Chopin. In the second application, 24 fugue 
subjects of Bach and Shostakovich are analyzed ii^ 
terms of how many notes are needed to determine 
the correct key. The average ntunber of notes i? 
about 5, considerably less than required by alter- 
native algorithms proposed in the literature. In 
the third and most elegant application, die key 
modulations in a single Bach prelude are tracked 
measure by measure and compared to judgments 
by two experts. There is good agreement, though 
the algorithm does not quite match the experts. 
The k^y changes are represented graphically as a 
path on the surface of the torus representing the 
configuration of interkey distances (Chapter 2). A 
final section of Chapter 4 acknowledges the cur* 
rent limitations of the algorithm, which include its 
insensitivity to temporal order, melodic patterns, 
harmonic structiu^e, and rhythmic stress. 

In Chapter 5, Krumhansl returns to perceptual 
data and in fact reports an original study not 
published elsewhere, which replicates and extends 
one of her early experiments. The topic is the 
perceived relation between two musical tones. 
Whereas in the experiments that led to the tonal 
hierarchy profiles the subjects* task was to judge 
how well a single note fit into the tonal context 
established by a precursor sequence, listeners now 
hear two notes following the key-defining context, 
and the task is to rate how well the second note 
goes with the first. The goal is to demonstrate that 
these perceived relations depend on the tonal 
context — ^for example, that the notes C-G are 
perceived as a better sequence than the notes C#- 
G# when the key is C major, even though both 
note sequences represent the same musical 
interval (a fifth). Krumhansl starts out by 
discussing various spatial representations of the 
psychological pitch relations among tones, which 



increasingly take the functional roles of tones into 
account. Although the author persists in talking 
about the perceived relations among successive 
tones, what her eaq>eriment is really about is the 
functional role of two-note sequences within a 
given key. It comes as no surprise, then, that the 
order of the two notes plays an important role, a 
factor tibat cannot be accommodated by traditional 
multidimensional scaling of similarity data. 
Krumhansl nevertheless presents the results of 
such an analysis, but also notes its shortcomings. 
The spatial solution that best approximates the 
perceived tonal relations shows the tonic at the 
vertex of a cone, along whose circumference the 
other tones are arranged according to pitch, but 
with their distance from the tonic being an inverse 
function of their position in tlie tonal hierarchy. A 
more complete picture including order effects 
emerges firom a multiple-regression analysis: 
Listeners' judgments were most strongly 
influenced by Hie position in the tonal hierardiy of 
the second tone, with weaker but significant 
contributions of the tonal hierarchy of the first 
tone, the pitch distance between the two notes, 
and the distance between the two notes along the 
circle of fifths.^ The chapter concludes mth a 
demonstration that the results are positively 
correlated with the relative frequencies of melodic 
intervals in several musical corpora, as tabulated 
previously by others. 

Chapter 6 first sunmiarizes three principles that 
have emerged firom this research and from the 
work of others on perceptual organization and 
memory. The principle of contextual identity 
states that stable tones (i.e., tones high in the 
tonal hierarchy) are remembered better than 
unstable tones. The principle of contextual 
distance states that two tones are perceived as the 
more closely related (and hence are also more 
easily confused in memory) the more stable either 
of them is. The principle of contextual asymmetry 
states that two tones are perceived as more closely 
related when tlie second tone is more stable than 
the firsf; than when they are in the opposite order. 
These principles are expressed formally in terms 
of perceptual distances, and relevant findings are 
cited from the literature. The principles are said 
to support basic tenets of Gestalt theory, with 
tonality providing a kind of Gestalt quality, 
though (to this reader) this argument does not add 
any explanatory power. The second half of the 
chapter discusses perceptual grouping principles 
in music, with data from several recent studies by 
the author and her collaborators. These studies 
show that pitch and rhythm make independent 



contributions to perceived phrase structore, that 
there are reliable boundary cues in performances 
of pieces by Mozart at well as Stoddiaiisen, and, 
most intriguingly, that 6-month old infants prefer 
music that is interrupted at phrase boundaries to 
music that is interrupted in the middle of a 
phrase. Lowering of pitch and increases in tonal 
dux^tion are identified as boundary cues likely to 
have been salient to these infants, and the 
analogy to speech proaody is noted. 

Chapters 7 and 8 are easOy summarized. They 
report the results of experiments with chords that 
replicate in all essentials the experiments with 
tones described in Chapters 2 and 3. Chapter 7 
reports data not published previously. Listeners 
were presented with one (Chapter 7) or two 
(Chapter 8) triadic chords following a key- 
establishing context and judged how well they 
followed the preceding context. The results are 
shown to reflect the relative liability of the chords 
in the tonal system, and they illustrate eadx of the 
three general principles discussed in Chapter 6. 
Memory for chords in a sequence is also shown to 
reflect relative stability, and chord stability is 
found to correlate with frequency of occurrence of 
chords in tonal music. Krumhansl concludes by 
summarizing the many parallels between the 
perceptual organization of tones and chords. 

All the work up to this point can be considered 
as concerned with establishing basic facts concern- 
ing tonal organization in perception and memory. 
In Chapter 9, Krumhansl summarizes two studies 
that make use of these basic data in addressing 
two more complex scenarios: key modulation and 
polytonality. In the key modulation experiment, 
probe chords are presented after every single 
chord of chord sequences that modulate to close or 
distant keys. By correlating subjects' judgments 
with the tonal hierarchy profiles obtained previ- 
ously in unambiguous key contexts (Chapter 2), 
the relative strengths of different keys can be as- 
sessed as the diord sequence unfolds. By treating 
these strengths as distances, the changing sense 
of key through the chord sequence can be repre- 
sented as a path in the toroidal key-distance map 
derived in earlier studies. The analysis n^eveals lis- 
teners' initial resistance to radical key changes, 
followed by abrupt shifts into the new key when 
the following context confirms it. In the experi- 
ment on the perception of polytonality, a famous 
excerpt from Stravinsky's ^etrouchka" is used in 
which two distant keys (C# and F# m^jor) are 
used simultaneously.^ Probe tones are presented 
after the bitonal passage, as well as after each 
tonal component played separately. Detailed anal- 



ysis of the results suggests that subjects* judg- 
ments are governed by the notes presented, and 
hence also by both keys, but that no dear sense of 
either tonality develops. Listeners were generally 
unable to focua on one or the other tonality, even 
when instructed to do so. Thus it seems that poly- 
tonality, in this instance at least, prevents the es- 
tablishment of either a single or a multiple tonal 
tiraniework; instead, it creates ambiguity. 

The author ventures farther afield in Chapter 
10, which reports studies that applied the probe 
tone technique to 12-tone serial music, to North 
Indian claasical music, and to Balinese gamelan 
music (the last study done by Kessler and col- 
leagues). The resulting probe tone profiles, ob- 
tained at various points during and/or following 
musical excerpts from these various styles, were 
analyzed to determine the factors that played a 
role in subjects* jtidgments. In the study of 12-tone 
music (excerpts from two of Scfaoenbarg^s works), 
two groups of subjects could be distinguished 
whose patterns of responding were almost exact 
opposites of each other One group, generally more 
familiar with 12-tone music, avoided tonal impli- 
cations like the plague, whereas the other group 
was governed by whatever tonal implications they 
could derive from surface features such as note 
length and recen^r. Similarly, in the experiment 
using North Indian munic in d^erent keys (thats), 
experienced subjects gave probe tone profiles that 
enabled Krumhansl to recover through multidi- 
mensional scaling analysis the key (that) distance 
map postulated a priori, whereas other subjects 
gave a much less clear pattern and seemed to be 
governed by surface features of the music rather 
than by the underlying scales. Krumhansrs con- 
clusion that 'Vsteners can set aside ... expectations 
pnd hear the pitch events in style-appropriate 
terms quite independently of their prior musical 
experience* (p. 268) is perhaps premature, but her 
results demonstrate that musical compositions in 
different styles often provide the ^surface* in- 
formation (emphasis, repetition, lengthening of 
important notes) a listener needs to infer the 
characteristics of the style, so the prior experience 
is simply not needed to appreciate simple struc- 
tural features. Interestingly, orthodox 12-tone 
music is different in that it studiously avoids such 
surface aids to the listener, so, in order to respond 
appropriately to this music, listeners need to know 
what not to expect This is an interesting demon- 
stration of the inherent radicalism of dodeca- 
phonic music, and an indication that it nega^jSs 
not only traditional (i.e., 19th century) aesthetic 
values but psydiological principles as well. 
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In her final chapter, Krumhansl first discusses 
rather briefly some formal properMes of the tonal 
system and of some other scale systems,^ and 
speculates that these properties may have arisen 
from psychological constraints, therdsy suggesting 
interesting future research to be done. The final 
pages summarize the principal findings from the 
empirical studies. The perception of tonal music is 
said to exhibit ''one of the hallmarks of a cognitive 
system: the categorization and classification of 
sensory information in terms of a stable, mtemal 
system of structural relations' (p. 282). That 
system, Krumhansl claims, is abstracted and 
internalized by listeners from the sound events in 
the music they encounter; that is, it is learned and 
style-specific, though it mfles use of general 
cognitive architecture to represent the external 
regularities. 

Krumhansrs book is a superb accomplishment 
and represents cognitive psychology at its best. 
This does not mean that it is beyond criticism. The 
question is, quite simply, whether cognitive 
psychology at its (current) best is good enough to 
explain musical phenomena. Music is a very 
highly developed art form whose complexities 
have kept musicologists busy for centuries. 
Cognitive psychology is not particularly well 
suited to studying art forms, or at least has not 
yet proven to be. What, to Krumhansl, are major 
insights gained from a decade of research may be 
platitudes to a musicologist (Butler, 1990) or 
musician. This problem is endemic to cognitive 
psychology, which searches for general principles 
that cut across many domains. However, it is with 
the specific properties of music that the study of 
music proper begins. It could be argued that 
cognitive psychology and the serious study of 
music are mutually exclusive, though perhaps 
complementary. If so, then even a tour de force 
such as Krumhansl's research will inevitably miss 
the significant issues in music perception. 
Nevertheless, it may provide a general framework 
within which these music-specific issues may be 
addressed in a more rigorous manner. 

The probe-tone task has been criticized by 
Butler (1989) as being insensitive to the d3nQamic 
unfolding of harmonic implications in tonal music, 
as permitting alternative listener strategies, and 
as being more sensitive to the tone distributions in 
the key-defining context than to the implied tonal- 
ity. Krumhansrs reliance on tabulations of note 
durations and frequencies was likewise attacked 
by Butler as being a crude method. The resulting 
exchange (Krumhansl, 1990; Butler, 1990) has not 
settled these issues completely, and further re- 



search will be necessary. It certainly would be in- 
appropriate to conclude (as Butler tends to do) 
that any of Krumhansrs results are artifactual 
until they have been proven to be so by careful fol- 
low-up experiments.*^ For one thing, most of her 
findings are in good agreement with conventional 
musical wisdom, which makes it likely that they 
will stand the test of time. It seems to this re- 
viewer that Krumhansl has justifiably ignored 
some significant musical detail in order to arrive 
at generalities, but the detail will have to be dealt 
with eventually. The dynamic tracing of harmonic 
expectations described in Chapter 9 certainly is an 
interesting beginning in that direction. 
Unfortunately, the probe tone method becomes 
prohibitively time-consuming as a tool for investi- 
gating modulations in real music, and trained mu- 
sicologists' judgments may ultimately prove not 
only more convenient but also more reliable. 

Like most cognitive psychology studies, 
Krumhansrs researdi is not concerned specifically 
with expert knowledge. After finding early on that 
musically untrained subjects tmd to use 
nonmusical response strategies in the probe tone 
task, she relied in the following on listeners who 
had considerable musical training but were not 
necessarily professional musicians or 
musicologists. This is both a strength and a 
weakness. It is a strength in so far as it 
demonstrates the solid, ingrained knowledge 
musically informed listeners have of the tonal 
system. It is a weakness in that it does not 
characterize what, if anything, musically 
uninformed listeners luiow about music, and, 
more importantly, in that it misses the special 
skills and insights provided by highly trained 
musicians and musicologists. In any investigation 
of a highly developed art, expert judgments must 
be the measure of validity— even if those 
judgments sometimes diverge. The consensus of 
average listeners can tell us what the average 
listener knows, but it will not capture the full 
subtlety of the phenomena under investigation. 

What about Serafine's (1988) warnings, died at 
the beginning of this review? It is certainly true 
that Krumhansl took certain musical elements — 
the Results of refiection'' — as given and proceeded 
to develop her representations of mental struc- 
tures in terms of those units (tones, chords, and 
keys). Her claim is undoubtedly that, even when 
not reflected upon, these units play a functional 
role in mental processing. It is also true, however, 
that the probe tone task directs the listener's at- 
tention to a particular unit (a tone or chord) and 
requests a judgment about it in the context of an 
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oftsa stereotyped and much*repeated, muflically 
trivial oontext that» moreover, is rendered in elec- 
tronic sound and with mechanical timing and dy- 
namics. There are ezcepttons, sudi as the eq>eri* 
ment using actual excerpts from Schoenbetg's mu- 
sic as the context for probe tones (Chapter 10). On 
the whole, however, it is quite possible that the 
musical samples in Krumhansl's experiments 
were not 'listened to as music,* by which Serafine 
(1988) presumably means that they were not per- 
ceived as musically meaningful or expressive. It is 
the more remarkable, then, that these meaning- 
less sequences inevitably and strongly engaged 
the listeners' knowledge of tonal hierarchy struc- 
tures; in a sense, then, they had some musical 
meaning, after all. It is the cognitive psycholo- 
gist's trump card that even highly schematic 
stimuli often engage mental structures designed 
for much more complex and ecologically valid 
events. It is the experts wild card« however, that 
only a very limited subset of pertinent structures 
can be probed in this way, so that an impover- 
ished view of complex phenomena may result 

In her introduction, Krumhansl refers several 
times to Musical experience,* but her researdi 
does not really deal with listeners' experiences. 
They made judgments that followed certain pat- 
terns; what they experienced, we do not know — 
prob^ly boredom. Krumhansl's spatial maps pre- 
sent us with crystaUized configurations — ^mental 
structures in vitro, as it were, that can be re- 
garded with awe, like a piece of modem ardiitec- 
ture. They convey none of the excitement and 
pleasure that comes from exploring the building, 
its comers and hallways. For an appreciation of 
musical meaning, we must read Langer (1953) or 
Zuckerkandl (1956) or Clynes (Clynes & 
Nettheim, 1982)— or simply listen. Krumhansl's 
cognitive world is one of discourse about music, 
not of music as ^significant form* (Langer, 1953). 
Yet, there must be a close relation between the 
two. The characterization of that relation is per- 
haps the fundamental problem of music psydiol- 
ogy. Krumhansl would be well equipped to tackle 
it as the next step in her remarkable career. 
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FOOTNOTES 

^Amemmn founul cfPsychciogy, 104, 611-621 (1991). 

^That is, considered as a post-war empirical enterprise. The 
psychology and philosophy of music have, of course, a 
distinguished history tet goes back many centuries. 

^Intcnatin^y/ the map goes beyond eartier rcpraentations in that 
it suggests that C mi^or is dosriy rdated to yet another key, e 
manor— which, in its descending mek>dic vernon, differs in fust 
one note from C mayor— but not to dminor, which also differs in 
one note. It is not dear whether tfiis observation is substantiated 
by any musicological evidence. 

^A consonance hierarchy— 4ny tenn — results from quantitative 
estimates of predicted or perceived consonance for all tones of a 
scale when they are soimded toge&ier with the tonic of that 

^It is poasftite to regard the twcvtone judgment task as a version of 
the one-tone task: Tht first probe tone merely extends or 
perturbs the tonal context in whidi the second tone is judged 

^An tmfbrtunate mistake in this otherwise very carefiilly edited 
volume occurs in the musical exan^es on page 229: The bottom 
staves should be in treble def throughout not in baas def . On 
the same page, in the penultimate line, "diminished triads'" 
should read 'dimirashed chords'*. Also, Krumhansl's spdling of 
Tetroushka'* is an unfortunate amalgam of Stravinsky's original 
French T^trouchka'' and its an^dzed vefsion, 'Petrushka''. 

^An error occurs on page 277: There are not two but three 
octa tonic scales; the '2-scale'' was mistakenly omitted from 
Table 11.4 at the t>ottom of page. 

^One of ^ arguments revolves aroufwl the frkct that the original 
tonal hierarchy profiles were l>ased on data from a subset of 
contextual conditions in which the most stable tones occurred 
more often than the unstaU>le tones. It appears that Krumhansl 
and Kaaskr selected those conditions that were most effective in 
inducing a sense of key, and it is not surprising that these 
contexts were precisely those that cmphaiized stable tones. 
Butler's question of whether the subjects' ratings reflected their 
senae of key or the frequency of occurrence of the staUe tones 
■ sema somewhat academic. 
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Toward an Emancipation of the 'Weaker Sense"* 

A review by Bruno H Repp 

Listening: An Introduction to the Perception of Auditory Events by Stephen Handel. 

Cambridge, MA: MTT Press, 1989. 

Auditory Scene Analysis: The Perceptual Organization cf Sound by Albert S. Bregman. 

Cambridge, MA: MTT Press, 1990. 

Cognitive Foundations of Musical Pitch, by Carol L. Krumhansl. 
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Auditory perception has always been the 
stepchild of psychology. The rapid advances of 
computer technology have, if anything, further in- 
creased the hegemony of the visual sense: The 
prototypical computer combines stunning graphic 
capabilities with a primitive sound inventory, and 
it seems useless without a monitor, whereas a 
loudspeaker can easily be dispensed with. The op- 
tic display capabilities of computers are utilized 
widely in psydiological research, their sound-gen- 
erating capabilities only by a few specialists. Even 
in those branches of psychology that ostensibly 
deal with audible things, such as psydiolinguistics 
and psychomusicology (not to speak of their an- 
cient armchair relatives, linguistics and musicol- 
ogy)> theory and experimentation are commonly 
based on visual representations of the objects un- 
der study. Books on auditory subjects (such as the 
three reviewed here) usually have plenty of fig- 
ures, but no sound sheets. 

It comes rather as a shock, therefore, when 
Stephen Handel opens his book with the confes- 
sion that *ln our culture, I would much prefer to 
be blind than to be deaT (p. 1). With this simple 
but soon convincing statement, he reminds us of 
the unique importance of the auditory sense to our 
life experience: More than vision (but rather like 
the tactile sense, which is much more limited in 
range), audition keeps us "Sn touch" with our envi- 
ronment Moreover, it is the basis of the two most 
important systems of human communication: 
speech and music. If televisionhad no sound, it 
would never have edged out radio as the most 
popular meditun of news and entertainment. 
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Much of the psychological research on hearing 
in recent decades has employed simple sounds and 
sophisticated psychophysical methods; this 
^sychoacoustics" continues to thrive in a some- 
what segregated fashion in many laboratories and 
in the pages of The Journal of the Atousiical 
Society of America. Another significant area in 
auditory research is speech perception, which is 
even more of a segregated specialty, called ""speech 
science* (or ""speech technology*) as soon as some 
application is in sight, and otherwise largely asso- 
ciated with work done at Haskins Laboratories 
and the reactions of others to it. Most of the 
speech perception work has been at the fringe of 
psydioacoustics, with speech sounds being taken 
apart into their smallest components until they 
ceased to be speech and researchers felt on famil- 
iar territory again. Much the same can be said 
about research on music perception, a large part 
of which has been concerned with pitch, duration, 
and loudness. 

At the same time, a few small rivulets began to 
flow beside the mainstream of reductionistic audi- 
tory research (itself a minor tributary to the St. 
Lawrence of largely vision-based psychology). 
James Gibson, in his influential discussions of eco- 
logical principles of perception, had relatively lit- 
tle to say about audition (though there is a chap- 
ter in Gibson, 1966), but his emphasis on envi- 
ronmental objects and events, and on the per- 
ceiver's attunement to systematically structured 
physical media, spread seeds which germinated 
during the 1970s. In the 1980s, Carol Fowler 
emerged as a champion of an ecological perspec- 
tive on speech perception (see Fowler, 1986), and 
James Jenkins wrote a stimulating chapter pre- 
saging a science of ecological acoustics (Jenkins, 
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1985). Albert Bregmau's research program on 
auditory organization had been under way for 
some time and yielded a steady stream of research 
reports, with occasional oontr3>utions {rom others 
(e.g.y Kubovy, 1981), applications to speech per- 
ception (e.g., Darwin, 1984), and a lively counter- 
point of inventive experiments from Richard 
Warren's laboratory (e.g.. Warren, 1984). Music 
psychology, a relatively obscure enterprise 
through the 1970s, suddenly gained momentum 
through publications such as the book edited by 
Deutsch (1982), Roger Shepard's (1982) influential 
article on pitch structures, and the extensions of 
his work by his former student, Carol 
Krumhansl.^ 

The three books reviewed here reap the harvest 
of these developments. One of them, Handel's 
Listening^ is a broad introduction to the 
perception of auditory events, with special 
attention to speech and music. Bregman's 
Auditory Scene Analyris focuses more narrowly on 
the perceptual organization of simple sound 
patterns, but treats this topic expansively, wi/ii 
the author^s own ideas and research at the center 
of attention. Krumhansrs Cognitive Foundations 
of Musical Pitch is even more specialized in that it 
summarizes the author^s research since the late 
19708, with only brief digressions into related 
literature. It is also considerably more succinct 
than the other two tomes, and it does not share 
their overt ecological orientation^ being squarely 
ill the tradition of cognitive psychology. What all 
three books have in common is excellence.^ 

AUDITORY EVENTS 
HandeFs book begins with a detailed but very 
readable introduction to the physics of sound pro- 
duction, presented without mathematical formu- 
las but with many illustrations. This is followed 
by a chapter on sound propagation in the envi- 
ronment, by two chapters dealing specifically with 
sound production by musical instnunents and by 
the human vocal tract, and by a chapter summa- 
rizing acoustic (and, very briefly, perceptual) 
commonalities between speech and music. The 
remaining diapters, which constitute roughly two 
thirds of the book, deal with issues of perception. 
The first of these chapters is on auditory stream 
segr^ation and lucidly overviews a topic treated 
in much more detail in Bregman's book. Chapter 
8, ''Identification of Spealcers, Instrumentt, and 
Environmental Events*, is particularly valuable in 
pointing out the common aspects of these impor- 
tant activities, which have been given less re- 
search attention than they deserve. Chapter 9 
deals primarily with categorical perception and 



context effects, with the focus on speech. Under 
the unfortunately misleading title, 'Grammars of 
Music and Language*, the next chapter deals al- 
most entirely with music, particularly pitch struc- 
tures, anticipating Krumhansl's more detailed 
treatment. (Was an earlier section on linguistic 
grammar deleted in the last minute?) The follow- 
ing chapter on rhythm is more balancad and pro- 
vides a very useful discussion of music in juxtapo- 
sition with prosodic aspects of speech. The final 
chapter, somewhat surprisingly, is on auditory 
physiology, but summarizes what is known about 
the auditory processing of complex sounds and 
speech, so that it ties in well with the general 
thrust of the book. A brief epilogue points out two 
aspects that were neglected in the book: the role of 
the listener's expectations and knowledge, and a 
characterization of the experience of listening. 

Handel's book contains a wealth of information, 
presented accurately, in simple prose, with 
numerous instructive examples. It brings 
together, often for the first time, topics that have 
been treated in articles scattered through the 
research literature, and it provides a coherent 
perspective on them. The writing is modest, 
thoughtful, and balanced; there is no dogma or 
strident criticism, nor any oversimplification of 
complex issues. Handel always shows a healthy 
respect for the complexity of natural phenomena, 
and he inculcates the same attitude upon the 
receptive reader. As Albert Bregman says on the 
book jacket, ''Listening is obviously the work ^f a 
master teacher.* 

AUDITORY SCENE ANALYSIS 
Bregman's own book. Auditory Scene Analysis^ 
is narrower in scope than Handel's but probes the 
topic in much greater depth. At 773 pages surely 
one of the heftiest monographs ever published in 
psychology, it rests heavily on Bregman's own re- 
search since the late 19606 and on the contribu- 
tions of a few other scientists working in the same 
area. Its leisurely, narrative style at times gives it 
the quality of a historical or philosophical treatise. 
In a very real sense, Bregman serves as the histo- 
rian of his own ideas and research. One rarely 
gets sudi an intimate view of a scientist's mind at 
work, nor such a comprehensive picture of per- 
sonal observations, experimental explorations, 
and alternative interpretations. Bregman invites 
the reader to join him on his intellectual journey, 
and I, at least, found the book difficult to put 
down. If HandeFs book shows a master tefAcher at 
work, then this is the product of a master thinker. 

The term "auditoxy scene analysis* was coined 
by Bregman to refer to the process of organizing 
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complex auditory input into internally coherent 
"streams* or auditory objects. He distinguishes 
two classes of such processes: ^^rimitive" and 
^schema-based" stream segregation* The book 
deals primarily with primitive processes, which do 
not depend on a listener's domain-specific knowl- 
edge. Bregman believes (although he acknowl- 
edges that further research is needed) that 
primitive scene analysis segregates auditory 
events before they are interpreted with reference 
to learned "schemas," as in listening to speech or 
musia 

Following an introductory chapter, more than 
half of the book is taken up by Chapters 2 and 3, 
which deal with sequential (temporal) and simul- 
taneous (spectral) integration/segregation, respec- 
tively. Chapter 2 introduces the now well-kno¥m 
phenomenon of auditory stream segregation and 
the seminal work of van Noorden (1975) — surely 
the most cited unpublished dissertation in the 
field — , and proceeds to discuss exhaustively what 
is known about the various factors that influence 
the perceptual grouping of acoustic elements. 
Chapter 3 discusses the factors that cause simul- 
taneous tones to fuse into a single percept or to be 
perceived as separate pitches or timbres* It covers 
a good deal of more traditional psychoacoustics 
(such as pitdi perception, binaural fusion, mask- 
ing, etc.), but Bregman never strays very far from 
his own researdi and brings in the findings of oth- 
ers primarily to illuminate or supplement the 
story of his enterprise. 

The reader who has been persistent enoiigh to 
plow through these two enormous but fascinating 
chapters, each a small book in itself, is faced with 
five additional, shorter chapters. Chapter 4, 
"Schema-Based Integration and Segregation* is 
shorter because Bregman's goal is to distinguish 
and separate knowledge-guided processes from 
primitive auditory scene analysis, and to keep the 
focus on the latter. Perhaps the most important 
theoretical argument of the book is that primitive 
scene analysis is independent of acquired knowl- 
edge, though what has been divided by scene 
analysis can sometimes be recombined into a 
higher-level (schema-based) unit. In the popular 
jargon of contemporary cognitive science (which 
Bregman studiously avoids), primitive scene anal^ 
ysis is modular and noninteractive. Chapters 5 
and 6 deal with auditory organization in music 
and speeds perception, respectively* Again, these 
discussions focus on the role of primitive scene 
analysis, not on the perceptual consequences of 
the categories and structures specific to each sys- 
tem. Thus they address such basic questions as 



"What makes a melody hang together?* and "What 
makes the different voices in polyphonic music 
distinct?*, or in speech, "Why are the sounds of 
speedi perceived as a coherent stream?" and "How 
do we separate several simultaneous voices from 
each other"? The parallel nature of these ques- 
tions in music an<'( in speech reflects the imiver- 
sality of auditory scene analysis. Music- and 
speech-specific knowledge is considered a 
nuisance factor from the perspective of this book, 
which treats music and speech as pure sound. 
This may disappoint some musicologists and 
linguists among the readers, but Bregman should 
not be blamed for saying little about topics that 
his book is not about; rather, the rigor of his 
approach must be admired, for there is a 
continuous temptation to elevate (or, rather, 
reduce) knowledge-based processes to the status of 
auditory primitives, particularly in the case of 
speech* 

Chapter 7 presents a relevant case study. Under 
the heading of "The Piinciple of Exclusive 
Allocation in Scene Analysii^", Bregman discusses 
the phenomenon of duplex perception, an instance 
in which the principle is violated (i*e*, the same 
sound appears to be heard as part of two different 
streams). In fact, Alvin Liberman and his 
collaborators at Haskins Laboratories claim that 
speedi schemas (Bregman's term) override and 
even "pre-empt" auditoiy scene analysis (see, e.g., 
liberman & Mattingly, 1989). Bregman discusses 
evidence to the contrary. Still, the issue remains 
somewhat unresolved at the end of the chapter, 
which is the most difficult and the least definitive 
in the book. The last chapter, "Summary and 
Conclusions: What We Do and Do Not Know about 
Auditory Scene Analysis" condenses the book's 
contents onto 65 pages, much for the benefit of 
readers who just want to get the gist of it. Here 
and throughout the volume, Bregman's honesty in 
acknowledging unresolved questions and missing 
empirical evidence is exemplary. There are many 
leads for future research to be done, and 
Bregman's accomplishment is made all the more 
impressive by his careful delineation of its current 
limits. This book wiW stand as an important 
milesteone in the history of 20th century 
psychology, as well as an inspiring human 
document.^ 

PITCH STRUCTURES 

With Krumhansl's monograph we enter a 
different world, yet one that dovetails nicely with 
Bregman's and especially with Handel's introduc- 
tion. Knunhansl is concerned with some of the 
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schema-guided processeB in music perception that 
exceeded the scope of Bregman's book» specifically 
the relationships among the pitdies of the 
Western tonal system. The monograph is a natu- 
ral outgrowth of Krumhansrs exceptionally sys- 
tematic and coherent research program, which is 
almost unique in the burgeoning field of music 
psydiology. 

Ludd and organized throughout, Krumhansrs 
writing lacks the old-fashioned charm of 
Bregman*8 meandering thoughts. Instead, there is 
a crystalline quality to her orderly designs and 
structural representations. Clearly, her most dis- 
tinctive achievement is in the domain of sophisti- 
cated quantitative analysis. As one of Roger 
Shepard's vrott brilliant students in the 1970fl, 
she absorbed the multidimensional tedmiques pi- 
oneered by her mentor and proceeded to apply 
them to musical problems in an imaginative and 
revealing way. Despite the formal complexity of 
these analyses, she makes the results always easy 
to grasp, with the help of many illustrations which 
are an essential part of the methodology. Having 
accorded Handel and Bregman master status, 
without intending to stereotype them in any way, 
I regard this book as the work of a master analyst 

Only a very brief summary of the contents can 
be given here. To convey the full flavor and ele* 
gance of Krumhansrs research, a much longer 
precis would be necessary, which will appear 
elsewhere (Repp, in press). KrumhansFs primary 
experimental technique is a probe task in which a 
musical context (a melodic fragment, sequence of 
chords, or excerpt from a composition) is followed 
by a probe tone or chord, whose adequacy as a 
continuation of the preceding context the listener 
is to judge. Probe elements are sampled exhaus- 
tively from a fixed set (such as the 12 tones of a 
scale), and a profile of average ratings across 
these elements is obtained. The autocorrelation 
matrix of this profile, which represents the simi- 
larities of the rating profiles for the same probe 
elements in the context of all different keys, is 
subjected to multidimensional scaling, which re- 
sults in a spatial representation of keys similar to 
such maps constructed intuitively by musicolo- 
gists. The ingenious part of the methodology be- 
gins when the key rating profiles established in 
the initial experiments are used as diagnostic 
tools for determining the perceived key following 
some arbitrary context Thus I&umhansl devises a 
key-finding algorithm based on the frequenqr dis- 
tribution of the most recent notes and tiieir corre- 
lations with all possible key profiles; by presenting 
probe tones after some musical excerpt, she de- 



termines which (if any) key is dominant at that 
point by finding the prototypical key profile that 
most closely resembles the obtained rating profile; 
and in the most advanced application of these pro- 
cedures, she traces the liste]><^r^s dmnging sense of 
key through a modulating sequence of chords by 
obtaining a probe tone profile after each chord, 
correlating each of these profiles with the stan- 
dard key profiles, and finally mapping these rela- 
tionships into a multidimensional space, where 
they trace a modulatory path among the various 
keys (represented as points in the space). High- 
wire acts such as these are complemented by re- 
sults from simpler memory tasks and other stud- 
ies in the literature. 

The central chapter of the book is Chapter 6, 
which defines three principles of tonal stability, 
and their effects ca the perceived relations 
between tones. Tonal stability is the central 
concept of the book, and indeed of traditional 
music theory; it refers to the fact that, in tonal 
music, there is a hierarchy of pitches, such that 
one pitch (the key-defining pitch or tonic) is most 
preferred or most important or most 
representative — in other words, most stable — at 
any given time, a second pitch (the dominant) is 
preferred next, and so on. KrumhansFs three 
principles, then, are (in simplified language): A 
stable tone seems more similar to itself (e.g., is 
remembered better) than an unstable tone; two 
tones seem more similar to each other if either of 
them is stable; and two tones of unequal stability 
seem more similar when the unstable tone 
precedes the stable one. A variety of evidence 
supports these statements, which parallel 
predictions made with regard to prototypicality in 
many other areas of cognitive psychology. 
Krumhansrs work indeed falls squarely into 
mainstream cognitive psychology and should con- 
tribute significantly to making music psychology 
seem part of this larger enterprise. 

THREE HELDS ON THE MOVE 

It is perhaps appropriate to conclude this review 
with some musings on the current state of three 
fields of research that are addressed by the books 
reviewed (ecological acoustics, speech perception, 
and music perception), and on the influence the 
books might have on research in the 1990b. As it 
happens, the three fields named are at rather 
different stages of development: one nascent, one 
burgeoning, and one temporarily stagnant These 
impressions are subjective, of course, and depend 
in large measure on where I draw the boumdaries 
of these domains of inquity . 



Toward an Emmtdpation of the "Vfeaker Sense* 



287 



Ecological acoustics. Under this rubric I would 
consider studies that deal with the analysis and 
perception of information in complex sounds other 
than the message elements of speedi or music — 
information that helps us identify individuals, 
objects, and events in our environment. 
(Ultimately, of course, the ''ecologically valid* 
study of speech and music as gestural events must 
be included, too.) Under this definition, Bregman's 
work is merely a prolegomenon to an ecological 
acoustics, though an essential one. Handel's 
Chapter 8 (^IdentiBcation of Speakers, 
Instruments, and Environmental Events*) is most 
pertinent, inasmuch as speaker and instrument 
identification are not really linguistic or musical 
activities. The literature on human speaker 
identification is relatively small (much smaller 
than that on snutomatic speaker recognition), and 
most of it originates in Europe* Research on 
instrument identification is almost nonexistent. 
The related topic of the acoustic expression and 
perception of emotion in speech and music is 
likewise under-researched, with Klaus Scherer's 
work on speech standing as a single beacon in the 
desert (see Scherer, 1986). Handel discusses 
Warren and Verbrugge's (1984) study of breaking 
and bouncing events, which is a prototype for 
ecological acoustics research built on Gibsonian 
premises, but little has happened since except for 
a few isolated studies on seemingly exotic topics 
including ^chilling* sounds (Halpem, Blake, & 
Hillenbrand, 1986), hand clapping (Repp, 1987), 
and the sounds of kitchen pans struck with 
mallets (Freed, 1990). However, those who doubt 
the potential significance of studies in ecological 
acoustics may be converted by reading Tom 
Johnson's (1984) still unpublished dissertation on 
doctors' perception of human heart beats, a bril- 
liant foray into real-world relevance. The message 
of all these studies is that we hear not just sounds 
but, through their structure, environmental hap- 
penings and organisms in action. Hopefully, 
Handel's book will stimulate more research on 
how we use our ears to perceive actions and 
events — how we hear the world. 

SrEECH PERCEPTION 

Researd) on speedi perception began at Haskins 
Laboratories in the early 1950s and largely 
remained dependent on the technology available 
there for the next two decades. Then, with 
computers getting smaller and cheaper, and with, 
software replacing hardware synthesizers, other 
laboratories got into the business. Much of that 
research, howe^ 3r, remained methodologically and 



conceptually depeiident on the Haskins research: 
Nearly everyone tried to support, refute, or extend 
the clfiUEQs of the Haskins researchers. (Among the 
few significant exceptions, Richard Warren's 
consistently original— though psychoacoustically 
tinged— contributions are especially noteworthy; 
see, e.g.. Warren, 1982, 1984.) The 1970s and 
early 19808 were fertile years for speech 
perception research, with several popular 
paradigms being milked dry and lively arguments 
going back and forth. Now these activities seem to 
have slowed down, very much in proportion to the 
decline of speech perception researdi at Haskins 
Laboratories, where most of the effort is nowadays 
directed to speech production. The older 
generation of speech perception researchers has 
reached retirement age, and many of the younger 
(now middle generation) protagonists of the 1970s 
have turned to different topics or tend to publish 
less, with only a few die-hards continuing to suck 
on the dry teats of their superannuated 
paradigms. There seems to be a general lack of 
intellectual ferment in the field. 

Was speech percep:tion research (as defined 
rather narrowly here) just a historic episode? Did 
Bregman in his chapter on duplex perception cap- 
ture the last, already somewhat peripheral con- 
troversy? Perhaps, as far as the dominant and 
unifying (or, rather, constructively divisive) role of 
Haskins Laboratories is concerned. It may take a 
while before new ideas develop and strong voices 
emerge to put them forward. Handel's book will 
make only a minor contribution here; what is 
needed is a coherent body of research that makes 
a point, comparable to what Krumhansl and 
Bregman have to offer. The ideas of the most in- 
novative theorist in recent years, Carol Fowler, 
hold much promise but have not yet resulted in a 
critical mass of empirical findings. Meanwhile, of 
course, there is a rich matrix of related research 
activities in experimental phonetics, speech sci- 
ence, speech technology, and psycholinguistics, 
whid) are not experiencing a similar recession and 
may provide hotbeds for new directions in speech 
perception research. 

MUSIC PERCEPTION 
Music psychology, and particularly research on 
music perception, is on the rise. One of the prime 
movers is Diana Deutsch who has earned the so- 
cio-scientific triple crown by editing the first mod- 
em collection of articles on the subject (Deutsch, 
1982), founding the journal Music Perception in 
1983, and by recently establishing the Society for 
Music Perception and Cognition, in addition to be* 
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ins A fertile and original researcher at the psy- 
choaooustic end of the miuic spectrum. A number 
of other significant books have appeared in recent 
years, of which Slobodans (1985) is the most origi- 
nal, and music-relatad conferences abound. There 
is a rapidly increasing pool of talented young in- 
vestigators, each of whom quickly seems to find a 
nidie in the vast territory offered by musical ques* 
tions and phenomena. Interdisciplinary confer- 
ences bring psychologists together with music 
technologists, musicologists, composers, and per- 
formers — some sceptical, to be sure, but all eager 
to exchange ideas and explore new avenues. 
Electronic instruments, sophisticated software, 
and MIDI systems offer new and exciting possibil- 
ities for research and practice. As an added special 
touch, a shared love for music unites scientists of 
very different theoretical persuasions: The fact 
that music gives aesthetic pleasure and spiritual 
stutenance is never far from their minds and fre- 
quently invades their discussions, whereas speech 
researdiers, for example, rarely think of drama or 
poetry in connection with their work. 

Krumhansrs book rides the crest of a wave to 
which she herself contributed significantly. The 
book serves mainly to bring her work to the at- 
tention of those who have not been following her 
progress in the journals, and it is admirably 
suited for that purpose. The research itself, of 
course, has led and influenced the field for some 
time, and also has aroused some controversy 
(Butler, 1989; Krumhansl, 1990; Butler, 1990)— a 
healthy sign of a science's vitality (cf. Hull, 1988). 
Unlike Bregman's life work, whidi has the quality 
of a fortress under construction, with open doors 
but numerous escape routes, all thoroughly ex- 
plored in advance of any possible attack, 
KrumhansFs work, with its carefully planned de- 
sign, its built-in dependencies among experiments, 
and its strong reliance on one particular 
methodology, appears much more vulnerable and 
transparent, more like a contemporary office 
building in a historic neighborhood. It remains to 
be seen whether her constructs and methods can 
withstand critical onslaught. Meanwhile, her book 
is required reading for anyone interested in the 
contemporary psychology of music, as indeed are 
the other two volumes reviewed here. In concert, 
this admirable trio should u>nvince anyone that 
auditory perception is worthy of much more at- 
tention by psychologists than it has received in 
the past. 
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FOOTNOTES 

^Naturally, thcM are just sdected highlights which happened to 
leave a strong jmprfaiion on me, 

^What ia mitaing from my shelf U a research monograph on 
qsaech perception from a cognitive or ecological p er sp e cti ve. 

^ A UMful supplement to the book woukl have been a aoundsheet 
or CD illustrating the auditory phenomena discussad in the 
book. Some yaaam ago, Bregman producsd a caaaette with such 
demonatrations and distributed copies to colleagues; I 
imdenrtand that copies can stiU be obtained by sending %5J0O to 
HmatMcGiUUfivfnity. 
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